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EDUCATIONAL  DIAGNOSIS  OF  INDIVIDUAL  PUPILS 

I 
THE  PROBLEM 

Educational  diagnosis  presents  many  problems,  each  with  its 
specific  implications.  In  an  approach  to  the  present  study  it 
does  not  seem  imperative  to  consider  either  a  logical  organiza- 
tion of  these  problems  or  an  exhaustive  summary  of  relevant  in- 
vestigations;  however,  mention  of  a  few  problems  and  methods 
will  assist  in  the  orientation  of  this  study  in  the  general  field. 
One  phase  of  educational  diagnosis  is  based  upon  standardized 
tests  and  scales,  which  have  been  used  to  study  and  compare  the 
attainments  of  groups  of  pupils,  usually  school  grades  or  school 
systems.  The  average  or  median  achievement  of  different  groups, 
the  extent  of  overlapping,  the  amount  of  variability,  and  the  dis- 
tribution of  results  have  been  used  as  measures  for  comparison. 
The  relation  of  the  attainment  of  a  group  in  one  function  to  its 
attainment  in  another  function  or  trait  has  been  studied  exten- 
sively and  expressed  by  various  formulae  for  correlation.  Results 
obtained  by  standardized  measurements  have  been  compared 
witli  teachers'  judgments,  directly  by  teachers'  rankings  and  in- 
directly by  comparison  with  school  marks.  Tests  of  the  same 
function  or  trait  have  been  compared  and  ranked  as  to  merit. 
Tests  of  different  traits  have  been  compared  and  ranked  as  to 
their  merit  in  the  evaluation  of  general  intelligence.  Mistakes 
made  most  frequently  by  the  group  have  been  studied.  These 
examples  of  the  use  of  standardized  tests  suggest  that  the  trend 
of  the  movement  in  scientific  measurements  has  been  to  em- 
phasize the  group. 

In  these  studies  and  in  even  more  extensive  ones  now  being 
undertaken,  the  individual  even  though  not  lost  sight  of  has 
not  received  as  much  attention  as  the  group.  It  would  seem 
that  greater  emphasis  should  be  placed  upon  the  measurement 
of  the  individual  and  the  interpretation  of  results  along  with  the 
measurement  of  the  group  and  the  further  development  of  the 
instrumentalities  of  measurement.  It  is  the  purpose  of  this 
study  to  ascertain  to  what  extent  and  with  what  degree  of  relia- 
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bility  standardized  tests  and  scales  can  be  used  to  discriminate 
educational  attainments  of  the  individual.  Is  it  possible  to  diag- 
nose a  case  and  to  prescribe  specific  mental  work  on  the  basis  of 
achievements  in  such  tests?  The  following  are  some  of  the  ques- 
tions to  be  considered. 

1.  How  can  individual  measures  of  achievement  in  different 
tests  be  compared  or  equated  without  losing  the  refinement  of  the 
original  scores? 

2.  Do  scores  of  equal  value  in  a  given  test  necessarily  have 
the  same  meaning  for  two  or  more  individuals? 

3.  What  is  the  amount  of  the  individual's  variability  among 
the  different  tests? 

4.  How  are  the  scores  of  the  individual  distributed  with  re- 
spect to  some  measure  of  his  central  tendency? 

5.  How  do  the  bright,  the  mediocre,  and  the  dull  pupils  com- 
pare with  each  other  in  their  variability  and  distribution  of 
achievements  ? 

6.  To  what  extent  are  there  extremely  variable  or  erratic 
scores  ? 

7.  How  do  the  bright,  the  mediocre,  and  the  dull  pupils  com- 
pare in  the  number  of  erratic  scores  which  they  make? 

8.  What  are  the  causes  of  the  extremely  variant  scores? 

9.  What  is  the  relation  between  different  measures  of  ability, 
between  different  measures  of  variability,  and  between  meas- 
ures of  ability  and  variability? 

The  specific  purpose  of  this  investigation  is  to  determine  the 
individual  achievements  of  seventy-tw^o  junior  high  school  pupils 
in  a  group  of  eleven  tests  given  at  three  different  times  during  a 
period  of  a  year  and  a  half.  The  tests  have  been  used  to  rank 
these  pupils  in  achievement,  to  determine  the  amount  of  varia- 
bility of  the  group  in  a  single  test,  and  to  determine  the  amount 
of  variability  of  the  individual  in  the  eleven  tests.  That  the 
data  obtained  from  these  tests  are  valuable  in  the  general  di- 
rection of  the  work  of  these  pupils  has  been  demonstrated  at  the 
Speyer  School  of  Teachers  College,  That  such  data  can  be  used 
to  advantage  in  the  prescription  of  special  work  in  certain  cases 
is  a  logical  assumption.  This,  however,  should  be  tested  by  prao- 
tice  and  by  further  experimentation,  v 


II 

PRELIMINARY  INVESTIGATION 

The  purpose  of  this  section  of  the  study  is  to  answer  the  first 
question  proposed  in  the  statement  of  the  problem,  namely: 
HoM'  can  individual  measures  of  achievement  in  different  tests 
be  compared  or  equated  without  losing  the  refinement  of  the 
original  scores?  This  section  is  introduced  not  only  to  describe 
a  way  of  equating  measures  but  also  to  compare  two  methods  of 
equating  measures  of  achievement  in  different  tests  and  to  de- 
termine by  which  method  more  reliable  results  can  be  obtained. 
Special  emphasis  is  placed  upon  the  classification  of  extremely 
variable  or  erratic  scores. 

The  data  for  the  preliminary  investigation  consist  of  the 
scores  of  ninety-seven  seventh  grade  boys  in  eleven  standardized 
tests  and  scales  given  in  February,  1916.  The  tests  are:  Woody 
Arithmetic  Scales,  Series  A,  Multiplication  and  Division ;  Trabue 
Completion-Test  Language  Scales,  Scale  B  and  Scale  C ;  Thorn- 
dike  Reading  Scale  Alpha  2,  Part  II ;  Thorndike  Reading  Scale 
A,  Visual  Vocabulary ;  Composition,  scored  by  the  Hillegas  Scale 
for  the  Measurement  of  Quality  in  English  Composition;  Ayres 
Measuring  Scale  for  Ability  in  Spelling;  Woodworth  and  Wells 
Association  Tests,  Opposites,  Mixed  Relations,  and  Easy  Direc- 
tions. The  description  of  the  subjects,  the  tests,  and  the  scoring 
of  the  tests  in  Section  III,  Experimental  Material  and  Method, 
is  applicable  here  and  is  omitted  from  this  section  because  the 
chief  concern  here  is  the  evaluation  of  methods  of  statistical 
treatment. 

The  first  method  used  to  compare  individual  measures  of 
achievement  will  be  called  Classification  by  Rank,  By  this 
method  the  scores  of  each  test  were  arranged  in  frequency  tables 
according  to  the  original  scores  of  the  papers.  The  scores  were 
then  turned  into  ranks.  The  highest  score  was  ranked  one  and 
the  lowest  score  was  ranked  ninety-seven.  In  cases  of  tied  scores 
the  mid-rank  of  the  interval  was  given  to  each  score.  The  eleven 
ranks  of  each  individual  were  assembled  and  arranged  in  order 
from  highest  to  lowest.  The  rank  by  the  original  distribution 
was  retained.     The  variability  of  a  score  was  then  measured 
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by  its  distance,  in  terms  of  ranks  by  the  original  distribution, 
from  the  median  rank  of  the  individual.  Obviously  the  ranks 
of  an  individual  could  to  varying  degrees  approximate  three 
forms  of  distribution, — a  distribution  skewed  downward  from  the 
median,  a  distribution  skewed  upward  from  the  median,  and  a 
distribution  approximating  the  normal  surface  of  frequency. 
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Case   65 

Fig.  1.  Different  Forms  of  Distribution  of  the  Scores  of  Individuals  in 
the  Eleven  Tests. 

The  cases  in  Fig.  1  illustrate  these  forms.  These  are  actual 
cases  selected  from  the  group  of  ninety-seven  boys.  The  scale 
at  the  left  of  the  plate  represents  the  range  in  ranks  which  could 
be  obtained  in  each  test.  The  letters  refer  to  the  tests  in  which 
the  ranks  indicated  were  made.  The  case  numbers  60,  65,  and 
92  are  the  serial  numbers  which  these  boys  chanced  to  have  when 
the  names  of  the  group  were  arranged  in  alphabetical  order. 
These  are  extreme  cases,  but  only  in  the  sense  that  they  are  near 
the  limit  of  the  range  of  the  respective  forms  which  they  are 
selected  to  illustrate,  and  not  in  the  sense  that  they  are  markedly 
different  from  other  cases. 
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Results  obtained  from  this  method  are  shown  in  Tables  IV  and 
V.  These  results  will  be  discussed  in  connection  with  the  results 
from  the  other  method. 

The  second  method  will  be  called  Classification  by  Standard 
Deviation  (S.D.).^  It  is  like  the  first  method  only  to  the  point 
of  the  frequency  tables  of  the  original  scores.  Using  these  tables 
the  original  scores  were  transmuted  into  multiples  of  S.D.  The 
scores  of  each  pupil  in  multiples  of  S.D.  of  the  original  distribu- 

TABLE  I 

The  ;U  Most  Erratic  Scores  Distributed  by  Quartiles  According 

TO  THE  Different  Classifications 
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Classification  by  Rank 
Classification  hy  S.D. 

tions  were  then  collected  and  arranged  in  order  of  S.D.  value 
from  highest  to  lowest.  The  variability  of  a  score  was  then 
measured  by  its  distance,  in  terms  of  S.D.  by  the  original  dis- 
tribution, from  the  median  score  of  the  individual.  The  dis- 
tribution of  all  the  scores  in  multiples  of  S.D.  is  given  in  Table 
VI,  The  results  show  that  there  are  thirty-four  scores  which 
deviate  from  the  medians  of  the  respective  individuals  by  more 
than  2  S.D. 

These  two  methods  of  equating  scores  and  determining  indi- 
vidual variability  will  now  be  compared  in  order  to  arrive  at 
some  basis  for  choosing  the  one  which  will  produce  the  more 
reliable  results.  In  Table  I  the  thirty-four  most  erratic  scores 
in  each  classification  are  distributed  among  the  quartiles  of  the 
group,  the  quartiles  being  determined  from  the  median  ranks 
of  the  individuals.  Quartile  I  is  the  highest.  The  table  reveals 
a  rather  marked  difference  between  the  two  classifications.  By 
the  Classification  by  Rank  the  erratic  scores  are  distributed 
quite  evenly  among  the  four  quartiles.  The  Classification  by 
S.D.  produces  decidedly  the  greatest  number  of  erratic  scores  in 
Quartile  IV,  while  it  produces  relatively  few  in  Quartile  I. 

Another  way  of  comparing  these  methods  is  by  dividing  the 

34  most  erratic  scores  of  each  classification  into  the  number 

above  the  median  and  the  number  below  the  median  score  of 

the  individual.     This  is  done  in  Table  II.     By  this  comparison 

1  Mean  Square  Deviation. 
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the  Classification  by  S.D.  differs  very  decidedly  from  the  Classi- 
fication by  Rank.  By  S.D.  the  number  of  erratic  scores  below 
the  median  greatly  exceeds  the  number  above  the  median;  by 
Rank  the  number  of  scores  above  and  the  number  of  scores 
below  the  median  are  about  equal. 

TABLE  II 

The  34  Most  Erratic  Scores  of  Each  Classification  Compared 

According  to  the  Number  Above  and  the  Number  Below 

THE  Median  Score  of  the  Individual 


bove  the 

Below  the 

Median 

Median 

18 

16 

4 

30 

Classification  by  Rank 
Classification  by  S.D. 

That  these  two  methods  do  not  affect  the  same  scores  in  a  dif- 
ferent way,  as  might  be  inferred  from  Table  II,  is  shown  by  the 
fact  that  of  the  34  most  erratic  scores  in  each  classification  only 
eight  are  common   to  both   classifications. 

The  last  method  of  comparing  the  classifications  directly  was 
by  distributing  the  34  most  variable  scores  among  the  eleven 
tests.  The  results  are  brought  together  in  Table  III.  Here 
also  there  are  some  rather  marked  differences  between  the  two 
classifications.  The  greatest  contrast  in  the  number  of  erratic 
scores  is  found  in  the  case  of  spelling. 

TABLE  III 

Distribution  of  the  34  Most  Erratic  Scores  Among  the  Tests 

According  to  the  Different  Classifications 
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3 

1 
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7 

6 

3 

0 

1 

0 

2 

8 

3 

2 

2 

Indirectly  further  comparison  of  the  two  classifications  can 
be  made  by  a  study  of  Tables  IV,  V,  and  VI.  Table  IV  shows 
that  the  average  range  of  pupils  who  stand  highest  and  of  those 
who  stand  lowest  is  considerably  less  than  that  of  those  of  aver- 
age ability.  According  to  range  in  ranks  above  and  below  the 
median  score  of  the  individual  the  four  quartiles  have  close  in- 
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verse  relation.  The  average  range  above  the  median  of  the  first 
quartile  is  approximately  the  same  as  the  average  range  below 
the  median  of  the  fourth  quartile.  The  inverse  relation  holds 
right  through  to  the  average  range  above  the  median  of  the 
fourth  quartile  which  is  approximately  the  same  as  the  average 
range  below  the  median  of  the  first  quartile.  The  range  between 
the  last  two  scores  at  each  end  of  the  distribution  of  the  individ- 
ual 's  scores  has  an  inverse  relation  smiliar  to  that  of  the  average 
range  above  and  below  the  median.  In  average  S.D.  the  quar- 
tiles  have  about  the  same  relation  as  in  average  range, 

TABLE  IV 

Summary  of  Variation  in  Ranks  of  the  97  Pupils  in  the 

Eleven  Tests 


Ap.  Interval 

Average  Range 

between   Last 

Two  Scores 

AtK 

A  bore 

Below 

Above 

Beloiv 

Ai\ 

Range 

Med. 

Med. 

Med. 

Med. 

S.  D. 

Quartile 

I 

69.8 

23.3 

46.6 

17.8 

3.2 

22.2 

.'.- 

II 

81.6 

36.2 

45.4 

13.1 

8.2 

26.9 

'' 

III 

78.5 

44.7 

33.8 

6.6 

10.9 

26.4 

n 

IV 

67.6 

45.9 

21.8 

2.3 

15.6 

22.2 

Table  V  is  a  distribution  of  the  ranges  in  ranks.  From  this 
the  median  range  in  ranks,  according  to  the  original  distribution, 
is  found  to  be  76.6  which  is  82  per  cent  of  the  maximum  possible 
range.  Taken  at  its  absolute  value  this  seems  to  be  a  high  per 
cent.  Whether  or  not  its  relative  value  is  high  must  await  fur- 
ther investigation.  This  question  will  be  considered  further  in 
connection  with  Table  XIII  in  Section  V. 

TABLE  V 

Distribution  of  Ranges  in  Ranks  of  the  97  Pupils  in  the 

Eleven  Tests 


Value 

Value 

in  Ranks 

Frequency 

in  Ranks 

Frequency 

90  to  94 

8 

60  to  64 

8 

85  to  89 

17 

55  to  59 

5 

80  to  84 

14 

50  to  54 

2 

75  to  79 

15 

45  to  49 

70  to  74 

13 

40  to  44 

1 

65  to  69 

12 

35  to  39 

30  to  34 

2 
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Table  VI  is  a  distribution  of  the  total  number  of  scores  accord- 
ing to  their  S.D.  distance,  in  terms  of  the  original  distribution, 
from  the  median  score  of  the  individual.  Its  chief  significance 
lies  in  the  fact  that  there  are  more  extreme  or  erratic  scores 
below  the  median  than  there  are  above,  and  also  that  they  are 
more  widely  scattered. 


TABLE  VI 

DlSTRinUTIO.N    OF    SCOBES   OF   THE   97    PUPILS   IN    THE   ELEVEN    TESTS 

According  to  S.D.  Distance  from  the  Median  Score 
OF  the  Individual 


Value  in  S.D. 

Frequency 
3 

Value  in  S.D. 

Frequency 

+2.6  to  +3.0 

Med.    to  —  .5 

287 

+2.1  to  +2.5 

1 

—  .6  to  —1.0 

116 

+1.6  to  +2.0 

21 

—1.1  to  —1.5 

69 

+1.1  to  +1-5 

62 

—1.6  to  —2.0 

32 

+  .6  to  +1-0 

137 

—2.1   to  —2.5 

16 

Med.   to  +  .5 

309 

—2.6  to  —3.0 

3 

—3.1  to  —3.5 

2 

—3.6  to  —4.0 

1 

—4.1  to  —4.5 

3 

—4.6  to  —5.0 

2 

—5.1  to  —5.5 

2 

—5.6  to  —6.0 

—6.1   to  —6.5 

1 

With  the  results  of  the  preliminary  investigation  at  hand  the 
question  was  to  decide  which  method  of  classifying  an  individ- 
ual's scores  should  be  used  to  secure  the  more  reliable  results. 

The  Classification  by  Rank  does  not  show  as  much  evidence 
of  reliability  for  a  study  of  this  kind  as  a  classification  by  some 
measure  of  variability.  A  significant  weakness, — the  one  which 
is  the  reason  for  its  elimination  in  an  investigation  of  this  type 
of  problem — is  the  fact  that  unequal  differences  between  original 
scores  cannot  be  discriminated.  By  the  frequency  tables  of  the 
original  scores  two  items,  one  in  each  of  two  contiguous  inter- 
vals in  one  table  are  ranked  consecutively ;  likewise  two  other 
items  of  the  same  table,  although  one  is  removed  several  inter- 
vals from  the  other,  are  ranked  consecutively  if  there  are  no 
items  intervening.  In  studies  where  the  scores  near  the  central 
tendency  are  of  chief  concern  this  defect  is  not  of  so  much  im- 
portance, but  in  this  study  it  seems  to  make  the  method  unre- 
liable. 
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The  Classification  by  S.D.  overcomes  the  weaknesses  pointed 
out  in  connection  with  the  other  one.  The  value  taken  for  unity 
is  the  amount  of  variability  which  is  "one  of  the  most  constant 
things  about  a  variable  fact  .  .  .  . "  ^  Different  magnitudes 
of  the  original  frequency  tables  are  preserved  because  they  are 
expressed  in  percentages  of  a  constant.  Therefore  the  Classi- 
fication by  S.D.  appears  to  be  the  superior  method. 

After  the  method  of  Classification  by  S.D.  was  chosen  as  being 
the  more  reliable,  and  the  work  of  this  study  by  a  similar  method 
was  well  under  way  further  confirmation  of  its  reliability  was 
discovered.     This  will  be  discussed  in  Section  IV. 

The  first  question  asked  in  Section  I  concerning  the  comparing 
or  equating  of  measures  of  achievement  of  the  individual  in 
different  tests  has  now  been  answered.  Two  measures,  Classi- 
fication by  Rank  and  Classification  by  S.D.,  have  been  described 
and  the  results  produced  by  each  have  been  compared.  The 
Classification  by  S.D.  shows  marked  superiority  over  the  other 
classification.  Its  use  seems  to  produce  more  reliable  results, — 
results  by  which  individual  achievements  can  be  compared  or 
equated  still  retaining  practically  all  the  refinement  of  the  origi- 
nal scores.    . 


iTrabue,    M.    R.,    Completion-Test    Language    Scales,    Teachers    College, 
Columbia  University,  Contributions  to  Education,  No.  77,  p.  30. 
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EXPERIMENTAL  MATERIAL  AND  METHOD 

1.     The  Subjects 

The  subjects  for  this  investigation  were  a  group  of  boys  in 
the  Speyer  School  of  Teachers  College,  Columbia  University. 
There  were  seventy-two  individuals  for  whom  complete  records 
were  secured  in  all  eleven  tests  in  all  three  testings,  February 
1916,  February  1917,  and  June  1917.  This  school  was  opened 
as  an  experimental  academic  junior  high  school  in  February 
1916.  The  group  which  entered  first,  about  two  hundred  in 
all,  came  from  twenty-four  classes  in  five  of  the  public  schools 
in  New  York  City,  Nos.  5,  10  B,  43,  184,  and  186.  The  seventy- 
two  subjects  for  this  study  were  among  this  group. 

Before  the  experimental  school  was  opened  the  twenty-foui* 
classes  in  the  public  schools  were  given  the  following  tests : 
Woody  Arithmetic,  Multiplication  Scale,  Series  A ;  Trabue  Com- 
pletion-Test Language  Scales  B  and  C ;  Composition,  scored  by 
the  Hillegas  Scale ;  and  Ayres  Spelling  Scale.  Soon  after  they 
entered  Speyer  School  six  additional  tests  were  given:  Woody 
Arithmetic,  Division  Scale,  Series  A;  Thorndike  Reading  Scale 
Alpha  2,  Part  II;  Thorndike  Reading  Scale  A,  Visual  Vocabu- 
lary ;  Woodworth  and  Wells  Association  Tests,  Opposites,  Mixed 
Relations,  and  Easy  Directions.  Complete  records  of  the  scores 
in  all  of  the  eleven  tests  were  secured  for  ninety-seven  of  the 
boys  entering.  These  tests  will  be  referred  to  throughout  this 
study  as  the  February  1916  tests. 

For  purposes  of  this  investigation  it  is  important  to  know 
whether  the  boys  are  a  highly  selected  group  or  whether  they 
represent  the  different  grades  of  ability  in  typical  classes  begin- 
ning the  seventh  school  grade.  Studying  these  same  boys  in  con- 
nection with  a  different  problem  Dr.  E.  K.  Fretwell  answers 
this  question  as  follows:  "It  is  noted  then  that  the  Speyer 
group  is,  on  the  basis  of  achievements  in  these  five  tests,  some- 
what better  than  the  other  group,  though  only  slightly  better. 
It  should  also  be  pointed  out  that  this  group  coming  to  Speyer 

10 
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did  not  cluster  around  the  median  of  achievement  and  that  there 
were  all  kinds  of  pupils  from  the  brightest  to  very  nearly  the 
dullest.  On  this  point  the  estimates  of  the  twenty-four  teachers 
are  in  accord  with  the  tests.  "^  The  five  tests  referred  to  are 
those  named  above  which  were  given  before  the  school  was 
opened.  The  estimates  of  the  teachers  were  for  intelligence 
and  industry.  A  more  detailed  discussion  of  this  question  and 
also  a  fuller  description  of  the  entrance  of  these  boys  into  the 
Speyer  School  may  be  had  by  consulting  the  study  referred  to 
above. 

It  is  also  of  importance  in  connection  with  this  study  to  point 
out  that  the  boys  were  divided  into  groups  for  the  purpose  of  in- 
struction on  the  basis  of  tlieir  achievement  in  terms  of  their 
average  rank  in  the  eleven  tests.  When  the  average  rank  for 
each  boy  was  determined  groups  of  about  twenty-five  each  were 
formed  on  the  basis  of  achievement  in  the  tests.  At  any  time 
after  this  the  teachers  by  their  combined  judgments  could  make 
any  transfers  they  considered  desirable  so  long  as  the  groups 
were  kept  approximately  the  same  in  size. 

Of  the  ninety-seven  boys  who  were  given  the  February  1916 
tests  seventy-five  were  in  Speyer  School  in  February,  1917,  and 
seventy-two  in  June,  1917,  when  the  collection  of  data  for  this 
investigation  was  finished.  The  distribution  of  these  seventy- 
two  boys  among  the  different  groups  in  June,  1916  and  June, 
1917  is  shown  in  Table  VII. 


TABLE  VII 
Distribution  of  the  72  Boys  Among  Gkoxjps 

IN   School 

Groups             12             3             4 
June,   1916 13            11            12            11 

5             6 

14           11 

June,   1017 13           11            15             7 

16           10 

This  table  shows  that  the  seventy-two  boys  were  quite  uni- 
formly distributed  among  the  two  hundred  and  therefore  were 
not  materially  different  from  typical  seventh  grade  boys. 

After  the  February  1917  tests  were  given  the  boys  were  num- 
bered from  1  to  75  according  to  the  alphabetical  arrangement 
of  their  names.     These  serial  numbers  are  retained  throughout 

1  Fretwell,  E.  K.,  A  Study  in  Educational  Prognosis,  Teachers  College, 
Columbia  University,  Contributions  to  Education,  No.  99. 
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the  investigation.  Tables  XXXIX  to  XLI  in  the  Appendix  con- 
tain the  scores  for  the  three  testings.  In  February  1917  com- 
plete records  for  seventy-five  individuals  were  secured.  Three 
boys,  Nos.  4,  22,  and  37,  were  not  present  when  the  June  1917 
tests  were  given.  Because  some  of  the  statistical  work  had  been 
done  before  the  last  testing  was  made  the  serial  numbers  were 
not  changed  to  1  to  72  but  instead  the  three  numbers  noted 
above  were  dropped. 

2.     The  Administration  and  Scoring  of  the  Tests 

The  standardized  educational  and  psychological  tests  listed  in 
Table  A^III  were  used  to  secure  the  data  for  studying  the  edu- 
cational attainments  of  the  pupils  who  have  been  described 
above.  The  tests  and  the  method  of  administering  them  will 
now  be  described  briefly.  References  to  full  discussion  of  the 
tests  by  their  authors  are  given  for  those  not  already  familiar 
Avith  these  tests  who  may  wish  to  make  further  study  of  them. 

Woody  Arithmetic  Scales  ~ 

The  Multiplication  Scale,  Series  A,  consists  of  thirty-nine 
problems  scaled  in  degree  of  difficulty.  The  first  problem  is 
so  easy  that  out  of  943  seventh  grade  pupils  tested  by  the  author 
of  the  scale  936  solved  it  correctly,  and  the  last  one  so  difficult 
that  of  the  same  group  only  186  solved  it  correctly.  Multipli- 
cation Scale,  Series  B,  is  composed  of  twenty  problems  selected 
from  Series  A.  It  covers  practically  the  same  range  of  difficulty 
as  does  Series  A. 

The  Division  Scale,  Series  A,  is  made  up  of  thirty-six  prob- 
lems, the  first  of  which  was  solved  by  822  out  of  940  seventh 
grade  pupils  tested  by  the  author  of  the  scale,  and  the  thirty- 
sixth  by  123.  Fifteen  problems  of  Series  A  covering  the  entire 
range  of  difficulty  compose  Series  B. 

The  time  given  was  sufficient  for  practically  all  of  the  pupils 
to  complete  the  tests.  In  accordance  with  the  recommendation 
of  the  author  "the  standard  for  marking  a  problem  correct  was 
absolute  accuracy,  and,  wherever  possible,  reduction  to  its  lowest 
terms."  One  point  was  given  for  each  correct  answer.  The 
score  for  the  individual  is  the  number  of  correct  answers. 

2  Woody,  Clifford,  Measurements  of  Some  Achievements  in  Arithmetic, 
Teachers  College,  Columbia  University,  Contributions  to  Education,  No.  80. 
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Hotz  Algebra  Scales  ^ 
The  Hotz  First  Year  Algebra  Scales  were  in  the  process  of 
construction  when  the  data  for  this  study  were  secured.  The 
Addition  and  Subtraction  Scale  is  made  up  of  twenty-four  prob- 
lems scaled  in  degree  of  difficulty  from  easy  to  difficult  prob- 
lems. The  Multiplication  and  Division  Scale  is  composed  of 
twenty-three  problems.     It  is  built  on  the  same  principle. 

Trahue  Completion-Test  Language  Scales  * 
These  tests  are  composed  of  multilated  sentences  which  thr 
subject  is  to  complete  by  filling  in  the  words  which  make  the 
"most  sensible  statement."  Scales  B,  C,  D,  and  E  consist  of 
ten  sentences  each,  scaled  so  that  they  range  in  P.E.  units  of 
value  from  about  1  to  between  10.5  and  11.  The  intervals  be- 
tween sentences  are  nearly  equal.  Scales  J  and  K  have  seven 
sentences  each,  ranging  in  value  from  a  little  more  than  4  to 
about  12.5 ;  and  L  and  M  have  eight  sentences  each,  ranging 
from  almost  7  to  a  little  above  11  P.E.  units  of  difficulty. 

Seven  minutes  were  given  for  completion  of  the  sentences  of 
each  scale.  In  this  amount  of  time  all  the  subjects  apparently 
had  opportunity  to  show  their  maximum  ability  in  such  work  for 
in  most  cases  more  sentences  were  attempted  than  were  correctly 
done.  The  method  of  scoring  was  that  suggested  by  the  author 
of  the  scales.  In  cases  where  the  lists  given  in  his  guide  for 
scoring  did  not  cover  the  answer  in  question  the  standard  de- 
cided upon  was  recorded  and  used  in  any  similar  instances.  This 
made  for  uniformity  in  scoring.  Two  points  were  given  for  each 
sentence  completed  correctly  and  one  point  for  "each  sentence 
completed  with  only  a  slight  imperfection. ' ' 

Scale  Alpha  2.  For  Pleasuring  the  Understanding  of  Sentences ' 
Part  II  of  this  scale  was  used.     Scale  Alpha  2  is  "an  im- 
proved and  extended  form"  of  "a  provisional  scale  Alpha  for 
measuring  ability  in  paragraph  reading."     Part  II  begins  with 

3  Hotz,  Henry  G.,  First  Year  Algebra  Scales.  Teachers  College,  Columbia 
University,  Contributions  to  Education,  No.  90. 

*  Trabue,  M.  R.,  Completion-Test  Language  Scales,  Teachers  College, 
Columbia  University,  Contributions  to  Education,  No.  77. 

5  Thorndike,  E.  L.,  '"Measurement  of  Achievement  in  Reading,"  Teachers 
College  Record,  Vol.  XV,  No.  4.  "An  Improved  Scale  for  Measuring  Ability 
in  Reading,"  Teachers  College  Record,  Vol.  XVI,  No.  5  and  Vol.  XVII,  No.  1. 
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difficulty  7  and  extends  through  difficulties  8,  8%,  and  9. 
There  are  ten  paragraphs  in  all,  concerning  the  meaning  of 
which  twenty-four  questions  are  asked.  The  subject's  achieve- 
ment in  the  test  is  determined  by  his  answers  to  the  questions 
asked  on  each  paragraph. 

The  selections  from  Beta  and  from  S  are  similar  to  the  para- 
graphs in  Alpha  2.  Because  Alpha  2  had  been  used  twice  it 
was  considered  best  not  to  repeat  it  again.  Therefore  three 
paragraphs  were  selected  from  Scale  Beta,  and  the  one  paragraph 
of  S  of  the  longer  reading  scale  was  added.  Twenty-nine  ques- 
tions are  asked  concerning  the  meaning  of  these  paragraphs. 

In  scoring  the  tests  answers  were  divided  into  three  classes, 
— correct,  slightly  incorrect,  and  wrong,  for  which  2,  1,  and  0 
points  respectively  were  given.  The  total  number  of  points  is 
the  score  given  the  individual.  The  time  was  sufficient  for  all 
but  the  very  slowest  to  do  as  much  as  they  could  with  the  test. 
As  an  aid  to  uniformity  in  scoring  record  was  made  of  types  of 
answers  concerning  which  there  was  question  as  to  their  class- 
ification. This  was  used  to  supplement  the  list  given  by  the  au- 
thor of  the  scale. 

Visual  Vocabulary  ® 

The  Visual  Vocabulary  tests  consist  of  lists  of  words  which  are 
to  be  classified  accordingly  as  they  mean  a  flower,  an  animal,  a 
boy's  name,  a  game,  a  book,  something  about  time,  something 
good  to  be  or  do,  or  something  bad  to  be  or  do.  The  classi- 
fication of  the  word  is  indicated  by  writing  a  designated  letter 
or  word  under  it. 

The  Thorndike  Reading  Scale  A  was  given  in  February,  1916. 
It  consists  of  forty-three  words  arranged  by  groups  of  five  in 
ascending  degrees  of  difficulty.  The  last  group  has  only  three 
words.  The  test  given  in  February,  1917  was  made  up  of  one 
hundred  and  seventy  words  in  fourteen  groups  selected  from 
the  Thorndike  Scale  A  2  plus  four  groups  selected  from  its  pro- 
visional extension.  The  groups  begin  with  step  6y2  x  and  ex- 
tend through  step  12y2. 

The  Thorndike  Reading  Scale  B,  y  series,  was  given  in  June, 
1917.     It  consists  of  one  hundred  and  twenty  words  arranged 

6  Thorndike,  E.  L.,  "Measurement  of  Achievement  in  Reading,"  Teachers 
College  Record,  Vol.  XV,  No.  4,  and  Vol.  XVII,  No.  5. 
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iu  groups  of  ten.  It  is  built  on  the  same  principle  as  the  other 
two  tests,  using  however  a  different  list  of  meanings  to  deter- 
mine the  classification. 

Composition '' 

The  subjects  for  the  test  in  composition  were :  for  February, 
1916,  How  I  Would  Spend  Twenty  Dollars ;  for  February,  1917, 
What  I  Should  Like  to  do  Next  Saturday ;  for  June,  1917,  How 
I  Should  Like  to  Spend  My  Vacation,  or  A  Narrow  Escape. 
These  were  all  rated  by  the  Hillegas  Scale  for  the  Measure- 
ment of  Quality  in  English  Composition.  The  first  set  of  com- 
positions was  rated  by  from  four  to  eight  experienced  judges. 
The  average  of  their  marks  was  taken  as  the  score  for  the  com- 
position. The  second  set  was  rated  by  four  experienced  judges 
and  the  third  set  by  three  of  the  four  who  rated  the  second  set. 
Here  also  the  ratings  were  averaged  to  determine  the  score  for 
the  composition. 

The  time  allowed  for  writing  the  composition  was  thirty  min- 
utes for  the  first  two  sets  and  fifty  minutes  for  the  third  set. 

Spelling  ® 

The  Ayres  Measuring  Scale  for  Ability  in  Spelling  was  used. 
The  first  time  the  tests  were  given  fifty  words  were  selected 
from  the  Q  list.  The  Q  list  is  rated  as  of  a  difficulty  such  that 
the  average  score  of  a  seventh  grade  class  should  be  92  per  cent. 
In  the  second  testing  fifty  words  selected  from  lists  U,  V,  W, 
and  X  were  given.  The  last  time  fifty  words  from  lists  T  to  Z 
inclusive  were  used. 

The  words  were  pronounced  by  the  regular  teacher.  Each 
word  was  pronounced  twice  and  a  third  time  if  asked  for.  One 
point  was  given  for  each  word  spelled  correctly.  The  teachers 
did  not  score  the  papers. 

Opposites  Tests  ^ 
The  Opposites  Tests  consist  of  twenty  words  each.     The  pur- 
pose of  the  test  is  to  determine  the  number  of  words  having 

7  Hillegas,  Milo  B.,  "A  Scale  for  the  Measurement  of  Quality  in  English 
Composition  by  Young  People,"  Teachers  College  Record,  Vol.  XIII,  No.  4. 

8  Ayres,  Leonard  P.,  A  Measuring  Scale  for  Ability  in  Spelling,  Division 
of  Education,  Russell  Sage  Foundation,  Bulletin  E  139. 
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a  meaning  opposite  the  words  of  the  list,  which  can  be  written 
in  a  g:iven  length  of  time.  The  "north-south"  and  the  "long- 
short"  lists  are  of  equal  difficulty.  These  were  used  for  the 
first  two  testings.  The  time  allowed  was  seventy-two  seconds. 
The  "high-low"  list  is  made  up  of  the  easiest  words  of  the  other 
two  and  consequently  the  time  was  reduced.  Forty  seconds 
were  allowed  when  it  was  given  in  June,  1917. 

The  responses  were  classed  as  either  right  or  wrong.     One 
point  was  given  for  each  correct  response. 

Mixed  Relations  Test  ^ 
In  the  Mixed  Relations  Tests  a  pair  of  words  is  given  to  indi- 
cate the  relation  desired  in  each  response  to  a  third  word.  There 
are  twenty  such  series  in  each  test.  Before  the  test  began  a 
sample  was  exhibited  and  the  explanation  made  that  after  the 
third  word  of  each  series  a  fourth  word  was  to  be  written  which 
would  have  the  same  relation  to  the  third  word  that  the  second 
had  to  the  first.  In  the  first  two  testings  one  hundred  and 
twelve  seconds  were  allowed,  but  the  third  time  this  was  reduced 
to  ninety  seconds.  The  responses  were  considered  either  right 
or  wrong,  one  point  being  given  for  each  one  right. 

Easy  Directions  Test  ^ 
In  the  Easy  Directions  Tests  the  subject  is  directed  to  make 
a  definite  response  such  as :  Cross  out  the  smallest  dot  .  •  •  , 
or  Cross  out  the  g  in  tiger.  The  two  tests  are  of  approximately 
equal  difficulty.  The  "smallest  dot"  test  was  given  in  Febru- 
ary, 1916,  and  the  "g  in  tiger"  test  both  times  the  tests  were 
repeated.  One  point  was  given  for  each  correct  response.  The 
time  allowed  was  eighty-two  seconds  for  the  first  two  testings 
and  eighty  for  the  third. 

Hard  Directions  Test  ^ 

The  Hard  Directions  Test  is  similar  to  the  Easy  Directions 

except   that  "the   object   here   is  to   complicate   the   directions 

somewhat,  by  calling  for  conditional  and  alternative  responses, 

etc."     The  first  two  or  three  directions  are  easy  enough  to  in- 

9  Woodworth,   R.    S.,   and   Wells,   Frederic   Lyman,   "Association   Tests," 
Psychological  Monographs,  Vol.  XIII,  No.  5. 
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sure  a  proper  start  on  the  test  and  the  rest  are  more  compli- 
cated. Because  of  the  "conditional  and  alternative"  responses 
the  scoring  is  somewhat  complicated.  A  standard  of  twenty- 
two  possibilities  for  mistakes  was  decided  upon  and  used  con- 
sistently in  scoring.  From  "twenty-two"  one  was  deducted  for 
each  wrong  response.     The  time  allowed  was  two  minutes. 

All  of  the  tests  were  either  scored  or  their  scorings  checked 
by  one  or  the  other  of  the  two  persons  chiefly  interested  in  the 
prosecution  of  this  study, — except  the  Composition  tests,  the 
scores  of  which  as  has  already  been  stated  are  averages,  the 
Algebra  tests,  and  twenty-eight  of  the  seven  hundred  and  ninety- 
two  papers  of  the  February  1916  testing.  In  scoring  the  papers 
and  copying  the  scores  extreme  care  was  taken  to  avoid  chance 
mistakes.  This  increased  the  amount  of  time  consumed,  but 
greater  accuracy  in  scoring  is  needed  for  individual  results  than 
for  group  results. 

It  is  very  essential  to  the  purposes  of  this  investigation  that 
the  scoring  be  uniform  and  that  as  fine  discriminations  as  pos- 
sible be  made  because  the  achievement  of  the  individual  in  spe- 
cific tests  is  the  problem  for  study.  An  error  in  scoring  which 
affects  the  group  standing  only  slightly  when  carried  over  to 
the  individual,  although  the  same  in  absolute  value,  has  rela- 
tively a  much  greater  significance  in  the  case  of  the  individual. 

3.     Special  Testing 

After  the  third  testing  of  the  entire  group  was  completed 
in  June,  1917,  a  special  testing  of  certain  boys  was  made  to 
compare  their  reactions  under  conditions  of  more  detailed  con- 
trol. In  Section  VI  the  results  obtained  from  the  special  test- 
ing are  analyzed  and  compared  with  the  results  from  the  origi- 
nal testings.  All  three  Spelling  tests  were  repeated  with  three 
boys.  The  words  were  pronounced  by  the  MTiter.  Each  word 
was  pronounced  twice  and  a  third  time  if  necessary.  The  '  *  long- 
short"  and  "high-low"  Opposites  tests  were  repeated  with  four 
boys.  The  time  for  each  was  forty  seconds.  Five  boys  were 
given  both  Mixed  Relations  tests.  Time :  ninety  seconds  for 
each.  Both  Easy  Directions  tests  were  repeated  with  four  boys, 
the  time  allowed  being  eighty  seconds.  These  special  tests  were 
given  in  the  office  at  the  Speyer  School.     Not  more  than  three 
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boys  were  tested  at  any  one  time.  Since  in  these  cases  a  low 
score  had  been  made  in  one  or  more  of  the  original  tests  it  was 
suggested  that  probably  this  was  caused  by  some  disturbance 
or  that  the  boy  was  not  feeling  well  on  the  day  of  the  test ;  fur- 
ther, that  probably  he  could  do  better  and  that  an  opportunity 
was  then  going  to  be  given.  The  same  explanation  that  was 
given  at  the  original  testing  was  made. 


TABLE  VIII 
The  Tests  and  the  Times  at  Which  They  Were  Given 


February,  1916 
Woody  Multiplication 

Series  A 
Woody  Division 

Series  A 
Trabue  Completion 

Scale  B 
Trabue  Completion 

Scale  C 
Reading  Alpha  2 

Part  II 

Visual  Vocabulary 

Reading 

Scale  A 
Composition 

How  I  Would  Spend 

Twenty  Dollars 
Spelling 

50  words  from 

Ayres  Q  List 
Opposites 

North— South 
Mixed  Relations 

Good — Bad 
Easy  Directions 

Smallest  dot 

June,  1916 
Trabue  Completion  * 

Scale  D 
Trabue  Completion  * 

Scale  E 


February,  1917 
Woody  Multiplication 

Series  B 
Woody  Division 

Series  B 
Trabue  Completion 

Scale  J 
Trabue  Completion 

Scale  K 
Reading  Alpha  2 

Part  II 

Visual  Vocabulary 

Selection  from 

Scale  A  2 
Composition 

What  I  Should  Lil<e 

to  do  Next  Saturday 
Spelling 

50  words  from 

Lists  U  to  X 
Opposites 

Long — Short 
Mixed  Relations 

Eye — See 
Easy  Directions 

G  in  tiger 

Hard  Directions  * 


June,  1917 
Hotz  Algebra 

Add.  and  Subt. 
Hotz  Algebra 

Mult,  and  Div. 
Trabue  Completion 

Scale  L 
Trabue  Completion 

Scale  M 
Reading 

Selections  from 

Beta  and  from  S 
Visual  Vocabulary 

Scale  B 

y  series 
Composition 

How  I  Should  Like  to 

Spend  My  Vacation 
Spelling 

50  words  from 

Lists  T  to  Z 
Opposites 

High — Low 
Mixed  Relations 

Good — Bad 
Easy  Directions 

G  in  tiger 

Hard  Directions  * 


*  These  tests  were  used  for  a  slightly  different  purpose  from  that  of  the  eleven 
tests  above. 


IV 

STATISTICAL  TREATMENT 

1.     Transmutation  and  Distribution  of  Scores 

Wlien  all  the  papers  had  been  scored  the  first  step  was  to  record 
the  scores  of  the  seventy-two  boys  in  such  manner  that  the  score 
of  every  boy  in  each  test  could  be  identified.  Tables  XXXIX 
to  XLI  in  the  Appendix  contain  these  results.  A  distribution 
table  of  the  original  scores  was  then  made  for  each  of  the  thirty- 
seven  tests.     The  semi-interquartile-range  (Q)  of  each  of  these 


distributions  was   found   by   using   the   formula : 


A     _  L 

4  *'        '4 


These  Q's  are  given  in  Table  IX.  The  reason  for  using  the  Q 
instead  of  the  S.D.,  which  was  used  in  the  preliminary  investiga- 
tion, is  discussed  under  topic  two  of  this  section. 

In  order  that  a  part  of  the  statistical  work  could  be  done 
before  the  last  tests  were  given  and  scored  in  June  1917,  the 
scores  of  the  group  of  seventy-five  pupils  who  took  the  eleven 
tests  in  February  1916  and  February  1917  were  used  for  the 
distributions  and  transmutations.  In  June  1917  three  of  these 
seventy-five  pupils  were  not  present  when  the  tests  were  given. 
Their  scores  in  the  two  previous  testings  were  dropped  from 
further  consideration  in  this  study.  This  produced  practically 
no  change  in  the  Q's  from  what  they  would  have  been  if  only 
the  seventy-two  pupils'  records  had  been  used  to  find  the  Q's, 
especially  since  of  the  three  records  missing  one  was  in  the  first 
tertile  and  two  in  the  second  tertile  in  February  1916,  and  one 
in  each  tertile  in  February  1917. 

Using  the  Q's  shown  in  Table  IX  the  intervals  of  each  dis- 
tribution according  to  the  original  scores  were  transmuted  into 
intervals  according  to  their  value  in  terms  of  the  Q  of  the  origi- 
nal distribution.  The  frequencies  in  these  intervals,  grouped 
in  intervals  of  one  Q,  are  shown  in  Table  X.  They  are 
graphically  represented  by  Figs.  3a,  b,  c,  to  13a,  b,  c. 
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TABLE  IX 

The  Semi-Intercjuartile-Ra.\ge   (Q)   of  the  Distribution  of  the 
Original  Scores  for  Each  Test 


Feb. 

Tests  1916 

Woody  Multiplication    2.13 

Woody  Division    2.t)2 

Hotz  Alg.  Add.  and  Subt 

Hotz  Alfj.  Mult,  and   Div 

Trabue  B,  J,  L,  D 1.71 

Trabue  C,  K,  M,  E 1.32 

Readino:  Tests    4.26 

Visual   Vocabulary    3.77 

Composition    5.60 

Spelling    1.60 

Opposites    1.89 

Mixed  Relations   4.58 

Easy  Directions   2.44 

Hard  Directions   


Feb. 

June 

June 

1917 

1917 

1916 

1.21 

.67 

2.06 
2.67 

1.22 

1.10 

1.36 

1.45 

2.03 

1.58 

4.59 

3.13 

13.88 

4.27 

4.17 

5.14 

3.94 

3.13 

.47 

1.54 

2.95 

2.93 

1.49 

.76 

2.46 

1.57 

TABLE  X 

Distribution   of  Scores  Transmuted   Into  Multiples  of  Q  Above  and 
Below  the  Median  of  the  Original  Distribution  in  Each  Test 


''ebruary    1916 


Value  in  Q 


s:S 


» 


February   1917 


Ad        HO 


+  4.0   to    +4.9 

2 

1 

+  3.0   to    +3.9 

1 

3 

1 

+  2.0   to    +2.9 

4 

8 

11 

7 

5 

6 

11 

+  1.0   to    +1.9 

14 

15 

16 

5 

14 

11 

10 

7 

14 

17 

8 

.0   to    +    .9 

18 

21 

12 

18 

14 

17 

18 

29 

22 

19 

17 

.0   to  —   .9 

16 

14 

21 

23 

22 

18 

19 

12 

11 

14 

19 

—1.0   to  —1.9 

9 

13 

9 

11 

9 

7 

13 

12 

14 

14 

12 

—2.0   to  —2.9 

7 

7 

3 

1 

4 

8 

4 

3 

7 

8 

5 

—3.0   to  —3.9 

2 

1 

2 

1 

3 

3 

4 

—4.0   to  —4.9 

1 

1 

3 

—5.0   to     — 

2 

1 

3 

Value   in  Q 


„  ^ 


hq 


+  4.0   to    +4.9 

+  3.0   to    +3.9 

1 

2 

2 

3 

+  2.0   to    +2.9 

4 

5 

6 

2 

2 

7 

+  1.0   to    +1.9 

13 

18 

5 

17 

]8 

16 

9 

14 

11 

.0   to    +    .9 

18 

18 

24 

11 

16 

16 

17 

22 

36 

25" 

36 

.0   to  —   .9 

12 

13 

23 

28 

18 

21 

18 

12 

16 

8 

13 

—  1.0   to  —1.9 

15 

15 

9 

5 

8 

11 

16 

13 

8 

12 

13 

—2.0   to  —2.9 

6 

5 

4 

2 

6 

2 

2 

7 

7 

2 

—3.0   to  —3.9 

2 

1 

3 

2 

n 

7 

4 

—4.0   to  — 4.9 

1 

1 

2 

2 

1 

— 5.0   to     

1 

2 

2 

7 

3 

statistical  Treatment  21 


June    1917 

2 

^ 

J 

3 

t.' 

o 

«a 

■ 
d 

Value  in  Q 

XI 

Sis 

s5 

C9 

=1 
Xi 

be 
a 
■5 

c 
p. 

a 

tCl 

o 

P. 

o 

.^tJ 

^ 

h 

<u 

.So 

o 

P. 

P. 

•~  i> 

a.S 

<i< 

tiS 

H 

^ 

» 

;>> 

a 

m 

O 

S« 

EdQ 

+  4.0  to   +4.9 

3 

+  S.0  to   +3.9 

4 

1 

1 

1 

+  2.0   to    +2.9 

9 

6 

2 

3 

1 

9 

+  1.0   to    +1.9 

8 

15 

19 

7 

12 

12 

12 

16 

19 

10 

30 

.0   to    +    .9 

15 

15 

11 

25 

24 

23 

14 

20 

17 

26 

6 

.0   to   —   .9 

26 

18 

22 

12 

13 

15 

21 

13 

19 

15 

17 

—1.0   to  —1.9 

7 

14 

9 

19 

H) 

10 

13 

12 

8 

10 

9 

— 2.0   to  — 2.9 

2 

4 

4 

4 

7 

5 

2 

2 

6 

5 

5 

—3.0   to  — 3.9 

1 

1 

1 

1 

4 

2 

3 

—4.0   to  —4.9 

3 

1 

4 

1 

1 

2 

—5.0   to     

3 

4 

1 

2 

3 

2.     The  Use  of  Averages  and  Variabilities 

Two  methods  of  comparing  the  scores  of  an  individual  in  all 
the  tests  were  discussed  in  Section  II.  Under  this  topic  of  this 
section  the  reason  for  a  slight  variation  of  the  method  selected 
will  be  given,  and  excerpts  from  studies  in  which  the  method  has 
previously  been  used  will  be  quoted. 

Because  one  purpose  of  this  study  is  to  discover  the  extremely 
variant  scores, — those  due  to  causes  that  prevent  the  individual 
from  making  a  normal  or  characteristic  reaction,  and  also  those 
due  possibly  to  unusual  ability  or  the  lack  of  ability — a  method 
that  would  tend  to  cover  up  these  sport  scores  should  not  be 
used,  but  rather  a  method  which  retains  their  variability  in  rela- 
tive proportions  should  be  used.  The  standard  deviation  as  a 
measure  of  variability  gives  more  weight  to  the  extreme  items 
than  to  those  nearer  the  central  tendency.  Therefore  it  would 
seem  that  a  measure  of  variability  which  avoids  such  weighting 
should  be  chosen.  The  Q  was  chosen  because  with  it  the  range 
of  the  items  beyond  it  does  not  affect  the  measure  of  variability, 
— the  number  of  items  beyond  a  given  point  being  the  influencing 
factor. 

A  minor  reason  for  using  the  Q  rather  than  the  S.D.  is  found 
in  the  matter  of  mathematical  inaccuracies  introduced  by  the 
lopping  off  of  fractions.  The  measures  of  the  group  are  carried 
to  two  decimal  places,  the  second  being  an  approximation  de- 
termined by  the  size  of  the  third,  and  the  measures  of  the  indi- 
vidual are  carried  to  one  decimal  place  only  with  the  same  method 


22  Educational  Diagnosis  of  Individual  Pupils 

of  approximation.  In  cases  of  five-tenths  or  more  the  pre- 
ceding figure  is  increased  by  one;  in  cases  of  less  than  five- 
tenths  it  is  not  changed.  The  S.D.  being  1.4825  Q  means  that  it 
would  carry  with  it  a  relatively  greater  mathematical  approxi- 
mation in  each  case. 

The  following  excerpts,  one  from  a  study  by  Naomi  Nors- 
worthy  and  the  other  from  a  study  by  R.  S.  Woodworth,  de- 
scribe and  discuss  the  method  more  fully. 

We  now  have  two  series  of  grades  in  the  same  measurement,  one  set 
from  mentally  defective  children  and  the  other  from  ordinary  school  chil- 
dren. The  usual  method  of  comparing  such  results  is  to  compare  the 
records  of  one  set  of  individuals  with  the  central  tendency  of  those  of 
the  others  of  the  same  age  and  sex.  But  in  this  case  there  were  not  enough 
defectives  of  any  age  to  make  the  results  gained  from  such  treatment  of 
any  value,  consequently  a  different  method  has  to  be  adopted.  The  method 
used  in  dealing  with  the  majority  of  the  measurements  was  one  which 
enabled  me  to  compare  the  records  of  all  of  the  defectives  with  those  of  all 
the  ordinary  children  without  restriction  as  to  age  or  sex.  Another  very 
decided  advantage  is  the  fact  that  the  units  of  grading  are  identical 
throughout  all  the  measurements,  as  will  be  evident  from  the  following 
description. 

The  difference  between  the  record  of  each  defective  in  any  test  and  the 
median  for  an  ordinary  child  of  the  same  age  and  sex  was  found.  This 
difference  was  then  transmuted  into  positive  or  negative  multiples  of  the 
probable  error  as  the  case  required.  .  .  .  By  thus  transmuting  the  dif- 
ference between  the  grading  received  by  defectives  and  ordinary  children 
respectively  in  every  test  into  multiples  of  the  probable  error  of  the  ap- 
propriate age  and  sex  I  can  compare  the  records  of  the  150  defectives  tested 
with  the  500  or  600  ordinary  children,  just  as  if  I  had  150  idiots  and  600 
school  children  all  of  tlie  same  age  and  sex.  Not  only  by  this  method  can 
I  consider  all  my  cases  together,  but  each  test  is,  so  far  as  is  possible 
comparable  with  every  other,  irrespective  of  whether  the  trait  examined 
is  physical  or  mental.  This,  so  far  as  I  know,  has  not  yet  been  done.  .  .  . 
This  method,  then,  provides  a  measure  by  which  we  can  tell  not  only  how 
far  the  idiots  are  below  school  children  in  the  various  traits  tested,  but 
how  much  farther  below  they  are  in  one  mental  trait  than  in  another  and 
whether  they  are  equally  deficient  in  physical  and  mental  traits.i 

What  is  needed  is  a  method  of  combining  results  which  shall  preserve  all 
the  refinement  of  the  original  measurements.  Such  a  method  exists,  and 
is  certainly  familiar  to  statisticians;  but  it  seems  to  be  overlooked  in  many 
cases  where  it  would  prove  of  value. 

Here  follows  a  discussion  of  averages  and  variabilities. 

There  is  a  way  of  eliminating  both  of  the  troublesome  quantities — both 
the  absolute  value  of  the  average  and  the  absolute  measure  of  variability. 
Let  the  average  in  each  case  be  counted  as  0,  i.e.,  let  the  individual's 
standing  be  expressed  as  a  deviation  above  or  below  the  average;  and 
further,  let  the  measure  of  variability  be  taken  as  the  unit  deviation,  and 

1  Norsworthy,  Naomi,  The  Psychology  of  Mentally  Deficient  Children, 
Columbia  University  Contributions  to  Philosophy  and  Psychology,  Vol.  XV, 
No.  2,  p.  50. 
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all  deviations  be  expressed  as  fractions  or  multiples  of  this  unit.  ( For  the 
measure  of  variability,  either  the  average  deviation,  or  the  moan  square 
deviation,  or  the  quartile,  etc.,  may  be  cliosen.)  What  this  method  does 
is  to  assign  each  individual's  position  in  tiie  distribution  of  the  group:  he 
stands,  namely,  above  or  below  the  group  average,  and  so  and  so  much 
above  or  below  as  compared  witli  the  average  variation  of  the  group. 

No  assumption  is  made  by  tliis  method  as  to  the  ratio  between  the 
variability  and  the  group  average;  for  the  average  is  taken  as  0  and  the 
variability  as  1,  independently  the  one  of  the  other.  Tlie  only  assumptions 
underlying  the  metliod  are  those  involved  in  every  use  of  averages  and 
variabilities,  namely,  that  the  average  means  the  same  thing  in  respect  to 
one  distribution  as  in  respect  to  another,  and,  likewise,  that  the  measure 
of  variability  means  the  same  thing  in  respect  to  the  different  distributions. 
Both  of  the  assumptions  are  correct  if  the  distributions  are  of  tlie  "normal" 
type,  or  if  all  the  distributions  belong  to  any  one  type.  Were  one  distribu- 
tion normal,  another  markedly  skew,  and  a  tliird  distinctly  bimodal, 
neither  the  average  nor  the  average  deviation  would  mean  quite  the  same 
thing  in  respect  to  the  three,  and  the  method  would  be  illegitimate;  but  in 
such  a  case  it  is  doubtful  if  the  distributions  ought  properly  to  be  com- 
bined at  all.  Mental  tests  usually  give  group  distributions  not  very  dif- 
ferent from  the  "normal"  though  tending  on  the  whole  to  be  somewhat  skew 
in  such  a  way  that  more  individuals  lie  on  the  good  side  than  on  the  bad 
side.  The  distributions  for  different  tests  do  not  differ  much  in  shape, 
and  no  considerable  error  can  be  introduced  by  placing  the  average  always 
equal  to  0  and  the  average  deviation  (or  mean  square  deviation,  etc.) 
always  equal  to  1.- 

Altliougii  the  method  is  fully  described  by  the  two  quotations 
just  given,  a  concrete  illustration  from  this  investigation  may 
serve  as  an  aid  in  projecting  the  method  to  this  study.  In  Fig. 
2  are  represented  graphically  the  records  of  the  same  three  in- 
dividuals whose  scores  according  to  the  classification  by  rank 
are  shown  in  Fig.  1.  The  reduction  of  the  number  of  subjects 
from  ninety-seven  in  the  preliminary  investigation  to  seventy- 
two  in  the  study  proper  changed  the  serial  numbers  so  that  Case 
60  here  is  Individual  45 ;  Case  65,  Ind.  49 ;  and  Case  92,  Ind.  71. 
The  reduction  of  the  number  in  the  group  makes  it  unfair  to 
compare  directly  the  range  and  placement  of  scores  in  values 
of  Q  with  their  range  and  placement  by  rank.  Further  com- 
parison in  this  respect  will  be  made  in  Section  VI.  Fig.  2  is 
introduced  here  to  illustrate  the  method. 

To  compare  these  individuals  in  their  standing  in  the  same 

test,  in  terms  of  achievement  in  that  test,  their  scores  should 

be  read  by  the  scale  at  the  right  of  Fig.  2.     In  spelling   (S) 

for  example,  Ind.  45  is  .IQ  above  the  median  of  the  group  (Med. 

gr.),  likewise,  Ind.  49  is  .IQ  above  the  median  of  the  group. 

They  both  received  the  same  original  score  in  spelling  which 

2  Woodworth,  R.  S.,  "Combining  the  Results  of  Several  Tests ;  A  Study 
in  Statistical  Method,"  The  Psychological  Review,  Vol.  XIX,  pp.  97-101. 
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Fig.  2.  Distribution  of  the  Scores,  in  Values  of  Q,  of  Three  Individuals. 
February   1916  Tests.     The  letters  signify  tests  as  follows: 

X — Woody   Multiplication  T — Composition 

W — Woody   Division  S — Spelling 

B — Trabue  B  O — Opposites 

C — Trabue  C  R — Mixed   Relations 

A — Reading  Alpha  2  I — Easy  Directions 
V — Visual  Vocabulary 

was  48.  Ind.  71  is  1.2Q  below  the  median  of  the  group  in  spell- 
ing. His  score  was  46.  The  achievement  in  this  spelling  test 
is  then  the  same  for  Individuals  45  and  49,  and  1.3Q  less  for 
Ind.  71.  The  median  achievement  in  one  test  is  rated  equal  to 
the  median  achievement  in  another,  and  likewise,  the  Q  deviation 
in  one  test  is  equated  with  that  of  any  other. 

This  explanation  holds  also  in  comparing  the  individual's 
achievement  in  different  tests  and  the  achievement  of  different 
individuals  in  different  tests.  For  example,  in  relation  to  the 
median  achievement  of  the  group  in  all  the  tests  Ind.  45  achieved 
the  same  in  Trabue  B  and  Mixed  Relations  as  did  Ind.  71  in 
Woody  Division,     All  three  of  these  scores  are  1.0  Q  above  the 
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median  of  the  group.  The  achievement  of  one  individual  in 
different  tests  is  simply  stated  in  multiples  of  Q  above  and  be- 
low the  median  of  the  group,  and  tiierefore,  compared  with  the 
median  of  the  group  in  all  the  tests  by  reading  from  the  scale 
at  the  right  of  the  figure. 

To  compare  these  individuals  in  tlieir  standing  in  the  same 
test,  in  terms  of  their  own  median  achievement,  their  scores 
should  be  read  by  the  scale  at  the  left  of  Fig.  2.  The  median 
line  of  this  scale  begins  with  the  median  score  of  the  individual 
ranking  highest  as  judged  by  median  attainment,  and  connects 
the  median  scores  of  the  three  individuals.  Using  spelling  again 
the  figure  shows  that  in  comparison  with  their  other  scores  Inds. 
49  and  71  are  alike  in  achievement  in  spelling, — spelling  repre- 
sents their  median  achievement — while  the  spelling  achievement 
of  Ind.  45  is  1.0  Q  below  his  median  achievement.  This  shows 
that  the  original  scores  cannot  be  taken  at  face  value  in  diagnos- 
ing individual  cases.  This  answers  the  question  in  Section  I 
concerning  the  meaning  that  the  same  scores  in  a  single  test 
may  have  in  connection  with  the  achievement  of  different  indi- 
viduals. Two  scores  of  the  same  test  having  the  same  value 
by  the  original  scoring  may  have  very  different  meaning  in  the 
two  distributions  of  the  individuals'  scores;  one  may  be  compara- 
tively high  among  the  achievements  of  one  individual,  while  the 
other  may  be  comparatively  low.  Any  two  scores  can  be  related 
to  the  median  achievement  of  the  respective  individuals  by  re- 
ferring them,  following  the  line  marked  out  by  the  median  of  the 
individuals,  to  the  scale  at  the  left  of  the  figure. 

In  the  excerpt  above  Woodworth  points  out  that  the  validity 
of  not  only  this  but  of  any  method  of  comparing  or  equating 
scores  depends  to  a  large  extent  upon  the  similarity  among  the 
forms  of  distribution  of  the  measures.  Figs.  3a,  b,  c  to  13a,  b,  c 
show  graphically  the  distributions  of  scores  in  values  of  Q  in 
thirty-three  of  the  tests.  The  distributions  show  no  great  dis- 
similarity except  in  the  cases  of  lib,  Opposites,  February  1937; 
13b,  Easy  Directions,  February  1917 ;  and  13c,  Easy  Directions, 
June  1917,  which  are  rather  markedly  skewed.  Tlie  other 
distributions  show  skewness  upward  and  downward  to  varying 
degrees,  but  are  similar  enough  in  general  to  be  compared  with 
but  little  loss  in  accuracy  from  this  cause. 
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The  effect  of  the  skewness  of  the  original  distributions  men- 
tioned above  upon  the  Q  is  illustrated  by  Fig.  14,  The  piling 
up  of  scores  reduces  the  extent  of  the  semi-interquartile-range 
measured  by  intervals  of  the  original  distribution  of  the  scores 
of  the  group.  The  reduction  is  more  pronounced  in  the  range 
above  the  median  than  in  the  range  below  the  median.  In  this 
form  of  distribution  the  Q  is  87.4  per  cent  of  the  Q  of  the  nor- 
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Figs.  3a,  b,  c  to  13a,  b,  c.  Distribution  of 
muted  into  Multiples  of  Q  Above  and  Below 
Distribution. 


Feb.   1916 

Fig.  3a.  Woody  Mult. 

"      4a.  Woody  Div. 

' '      5a.  Trabue  B 

"      6a.  Trabue  C 

"      7a.  Read.  Alpha  2 

"      8a.  Visual  Vocab. 

' '      9a.  Composition 

' '    10a.  Spelling 

"   11a.  Opposites 

"    12a.  Mixed  Relat. 

"    13a.  Easy  Direct. 


Fig. 


Feb.    1917 
3b.      Woody  Mult. 
Woody  Div. 
Trabue  J 
Trabue  K 
Read.   Alpha  2 
Visual  Vocab. 
Composition 
Spelling 
Opposites 
Mixed  Relat. 
Easy  Direct. 


4b 

5b. 

6b. 

7b. 

8b. 

9b. 
10b. 
lib. 
12b. 
13b. 


Scores  in  Each  Test  Trans- 
the  Median  of  the  Original 


June   1917  ^ 

Fig.  3c.  Alg.,  Add.  Subt. 

4c.  Alg.,  Mult.  Div. 

5c.  Trabue   L 

6c.  Trabue  M 

7  c.  Reading 

8c.  Visual  Vocab. 

9  c.  Composition 

10c.  Spelling 

lie.  Opposites 

12c.  Mixed  Relat. 

13c.  Easy  Direct. 


mal  distribution.^  The  reduction  in  the  value  of  the  Q  increases 
the  minus  value  of  the  low  scores  expressed  in  multiples  of  Q. 
It  places  them,  when  compared  with  scores  of  tests  having  a 
normal  distribution,  at  a  greater  distance  from  the  median  than 
they  would  be  if  the  test  had  been  of  such  nature  that  the  higher 
scores  had  been  spread  out  approximating  more  closely  the  form 
of  the  normal  distribution.  Moreover,  the  high  scores  which 
would  have  been  higher  if  the  test  had  permitted  are  covered 
up  by  this  form  of  distribution.     In  Fig.  14  a  score  8Q  below  the 

3  Thorndike,  E.  L.,  Mental  and  Social  Measurements,  p.  73.  Surface  of 
Frequency  of  Form  C.  (The  Q  is  derived  from  the  values  of  tr  and  the 
surfaces  A  and  C  are  smoothed.) 
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Fig.    14.     Showing  tlie   Effect  of   Two   DiiTerent   Forms   of   Distribution 
upon  the  Q. 

median  of  the  normal  distribution  would  be  3.43  Q  below  the 
median  in  the  skewed  distribution. 


3.     Redistribution  of  Scores 

After  the  intervals  of  the  original  distributions  were  trans- 
muted into  multiples  of  Q  the  next  step  was  to  transmute  each 
score  into  a  multiple  of  the  Q  of  its  distribution.  A  table  con- 
taining such  transmutations  of  the  2664  *  scores  grouped  ver- 
tically for  the  tests  and  horizontally  for  the  individuals  was 
made.  From  a  part  of  this  table, — that  part  showing  the  trans- 
muted scores  of  eleven  of  the  June  1917  tests — the  chart  shown 
in  Fig.  15  was  constructed. 

As  already  suggested  under  Topic  2,  achievement  by  any  two 
scores  of  one  test  or  different  tests  can  be  compared  directly 
by  this  method.  The  chart  of  Fig.  15  facilitates  such  compari- 
son of  the  scores  of  the  June  1917  tests.  The  scales  for  the 
graph  are  the  same  as  those  of  Fig.  2, — the  one  at  the  right  gives 
values  of  Q  above  and  below  the  median  of  the  group,  and  the 
one  at  the  left,  values  above  and  below  the  individual  medians. 

For  use  in  the  calculations  made,  charts  similar  to  this  were 

4  This  would  be  tlie  number  if  every  boy  liad  taken  every  test.  Scores 
are  lacking  for  pupil  No.  40  in  Composition  and  Spelling,  June,  1917,  and 
for  pupil  No.  27  in  Woody  Division,  February,  1917.  In  these  cases  the 
median  score  of  the  group  was  supplied.  Four  scores  are  lacking  in  Trabue 
D  and  E  each.  These  were  not  supplied  because  these  two  tests  were  not 
used  in  the  group  of  eleven. 


Serial   ■umber   of   liidlviaual 

'h1 


Fig  15.     Chart  Showing  the  Value  of  Q  in  Each  Score  of  the  June  1917 
Tests.     The  letters  signify  tests  as  follows; 
a — Algebra,  Add.  Subt.      b — Reading 
X — Algebra,  Mult.  Div.      v — Visual  Vocabulary 
1 — Trabue  L  t — Composition 

m— Trabue  M  s— Spelling 

The  scale  at  the  left  of  the  chart  is  for  values  above  and  beUi 
median  of  the  individuals.  The  scale  at  the  right  of  tiie  chart 
values  above  and  below  the  median  of  the  group. 


o — Opposites 

r — Mixed  Relations 

i — ^Easy  Directions 
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constructed  for  each  of  the  other  two  testings  using  the  eleven 
tests,  the  eight  Trabue  tests  combined,  the  six  mathematics  tests 
combined,  the  five  directions  tests  combined,  and  the  three  read- 
ing tests  combined.  In  these  charts  each  score  of  each  individual 
is  identified  by  a  letter  so  that  any  two  of  them  can  be  found 
and  compared  in  respect  to  achievement  in  values  from  the 
median  of  the  group  or  in  relation  to  the  other  achievements 
of  the  individual. 

Up  to  this  point  the  experimental  material  used  in  this  study 
and  the  method  of  treating  this  material  statistically  have  been 
considered.  The  subjects  and  the  tests  have  been  described. 
The  subjects  represent  very  closely  typical  seventh  grade  ability. 
The  tests  used  were  not  devised  for  this  special  study  but  are 
tests  which  have  been  carefully  standardized  and  used  exte;i- 
sively  in  other  investigations.  A  slight  variation  from  the 
method  of  equating  scores  decided  upon  in  the  preliminary  in- 
vestigation has  been  discussed.  The  Q  rather  than  the  S.D.  is 
used  as  the  measure  of  variability  because  the  extreme  scores,— 
those  which  probably  do  not  represent  the  individual's  normal 
reaction — have  no  greater  effect  upon  the  Q  than  do  other  scores ; 
while  they  do  have  a  greater  effect  upon  the  S.D.  than  other 
scores  nearer  the  central  tendency  have.  It  has  been  shown  that 
in  a  given  test  two  scores  having  the  same  value  do  not  necessarily 
have  the  same  meaning  for  the  two  individuals ;  but  that  the 
meaning  of  each  score  must  be  interpreted  by  comparison  with 
the  other  achievements  of  the  individual.  For  example,  one 
of  two  scores  equal  in  value  may  be  very  low  for  one  individu;il 
in  comparison  with  his  other  achievements  while  the  same  score 
for  another  individual  may  be  equal  to  or  above  his  median 
achievement.  The  next  section  will  deal  with  the  results  found 
in  connection  with  individual  variability  as  compared  with  group 
variability. 
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constructed  for  each  of  the  other  two  testings  using  the  eleven 
tests,  the  eight  Trabue  tests  combined,  the  six  mathematics  tests 
combined,  the  five  directions  tests  combined,  and  the  three  read- 
ing tests  combined.  In  these  charts  each  score  of  each  individual 
is  identified  by  a  letter  so  that  any  two  of  them  can  be  found 
and  compared  in  respect  to  achievement  in  values  from  the 
median  of  the  group  or  in  relation  to  the  other  achievements 
of  the  individual. 

Up  to  this  point  the  experimental  material  used  in  this  study 
and  the  method  of  treating  this  material  statistically  have  been 
considered.  The  subjects  and  the  tests  have  been  descril)od. 
The  subjects  represent  very  closely  typical  seventh  grade  ability. 
The  tests  used  were  not  devised  for  this  special  study  but  are 
tests  which  have  been  carefully  standardized  and  used  exten- 
sively in  other  investigations,  A  slight  variation  from  the 
method  of  equating  scores  decided  upon  in  the  preliminary  in- 
vestigation has  been  discussed.  The  Q  rather  than  the  S.D.  is 
used  as  the  measure  of  variability  because  the  extreme  scores, — 
those  which  probably  do  not  represent  the  individual's  normal 
reaction — have  no  greater  effect  upon  the  Q  than  do  other  scores ; 
while  they  do  have  a  greater  effect  upon  the  S.D.  than  other 
scores  nearer  the  central  tendency  have.  It  has  been  shown  tliat 
in  a  given  test  two  scores  having  the  same  value  do  not  necessarily 
have  the  same  meaning  for  the  two  individuals;  but  that  the 
meaning  of  each  score  must  be  interpreted  by  comparison  with 
the  other  achievements  of  the  individual.  For  example,  one 
of  two  scores  equal  in  value  may  be  very  low  for  one  individual 
in  comparison  with  his  other  achievements  while  the  same  score 
for  another  individual  may  be  equal  to  or  above  his  median 
achievement.  The  next  section  will  deal  with  the  results  found 
in  connection  with  individual  variability  as  compared  with  group 
variability. 


INDIVIDUAL  VARIABILITY  COMPARED  WITH  GROUP 
VARIABILITY 

1.     The  Amount  of  Individual  Variability 

What  is  the  amount  of  the  iudividual's  variability  among  the 
different  tests?  Stated  more  specifically  do  the  scores  of  some 
individuals  tend  to  be  high,  do  the  scores  of  others  tend  to  be 
near  the  average  of  the  group,  and  do  the  scores  of  still  others 
tend  to  be  low  ?  Or  are  the  scores  of  most  individuals  so  spread 
out  that  there  is  no  well  defined  mode?  The  answers  to  these 
questions  involve  other  questions,  namely:  By  what  standard 
shall  the  individual's  variability  be  measured,  and  by  what 
method  can  the  measurement  be  made? 

The  unit  of  measurement  already  described  will  be  used. 
The  Q  of  the  group  will  be  taken  as  the  unit  or  standard.  The 
median  achievement  will  be  taken  as  the  starting  point  and 
variability  will  be  measured  in  terms  of  the  amount  of  the  devia- 
tion in  either  one  or  both  directions  from  the  median.  What 
then  is  the  amount  of  variability  of  the  group?  It  is  the  stan- 
dard or  unit,  one  Q.  Having  now  related  the  standard  to  the 
problem  under  consideration  the  question  can  be  asked  in  more 
specific  terms,  namely :  What  per  cent  of  the  variability  of  the 
group  in  each  test  is  the  variability  of  the  individual  in  all  the 
tests, — that  is,  what  per  cent  of  the  Q  taken  as  the  standard  is  the 
variability  of  the  individual  ? 

With  the  scores  transmuted  into  values  of  Q  and  redistributed 
in  charts  such  as  that  shown  in  Fig.  15,  the  range  between  the 
two  extreme  scores,  the  range  between  the  median  score  and  the 
last  score  above  the  median,  the  range  between  the  median  score 
and  the  last  score  below  the  median,  and  the  range  between  the 
third  score  above  and  the  third  score  below  the  median  were 
found  for  each  individual.  Using  a  scale  these  were  read  di- 
rectly from  the  charts.  The  averages  of  these  ranges  for  the 
group  and  for  different  divisions  of  the  group  are  shown  in  Table 
XIII.     The  last  of  the  ranges  enumerated  above,  the  range  be- 

30 
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tween  the  third  score  above  and  the  third  score  below  the  median, 
is  an  approximation  of  the  interquartile  range.  It  covers  a 
distance  of  3  intervals  on  each  side  of  the  median  whereas  the 
interquartile  range  covers  a  distance  of  only  2.75  intervals  on 
each  side.  It  was  used  as  a  matter  of  economy  in  calculation. 
This  distance  was  readily  determined  whereas  the  distance  of 
2.75  intervals  would  have  necessitated  interpolation  for  the  value 
in  every  case.  The  following  correction  was  made  for  the  ap- 
proximation of  the  interquartile  range  so  that  the  results  would 
be  comparable  with  the  Q  of  the  original  distributions. 

By  the  method  used  the  approximation  of  the  interquartile 
range  covers  a  distance  of  6  intervals,  3  intervals  on  each  side 
of  the  median.  The  interquartile  range  covers  a  distance  of  5.5 
intervals,  2.75  intervals  on  each  side  of  the  median.  The  extent, 
in  terms  of  the  Q  of  the  original  distributions,  of  3/5.5  of  the 
measures  on  each  side  of  the  median  was  found.  The  extent 
of  2.75/5.5  of  the  measures  on  each  side  of  the  median  is  desired. 
3/5.5  of  50  per  cent  =  .2727.  Using  a  table  ^  of  values  of  ic/Q 
of  the  normal  probability  integral  it  is  found  that  27.27  per  cent 
of  the  surface  in  each  direction  from  the  median  corresponds  to 
a  distance  of  approximately  1.11  Q  on  the  base  line.  Hence  the 
values  found  are  111  per  cent  of  the  values  desired.  Dividing 
the  values  given  in  Table  XIII  (E)  by  2  and  making  this  cor- 
rection we  have  the  values  of  the  semi-interquartile  range  or  Q 
which  are  given  in  Table  XI. 

TABLE  XI 

Average  of  the  Individual  Semi-Intebquartile-Ranges  in  the 

Eleven  Tests 

The   table   reads    as   follows:     In    February,    1916,    the   average   of   the 

Q's  of  the  first  qtiartile  was  80  per  cent  of  the  Q  taken  as  the  standard,  etc. 


Quartile 

Tertile 

Total 

Corrected 
Total 

I 

II 

III 

IV 

I 

II 

III 

February    1916.. 
February    1917.. 

June    1917 

Average    

.80 
.71 
.79 
.77 

.90 
.86 
.75 
.84 

.91 
.87 
.94 
.91 

.99 
1.17 
1.29 
1.15 

.80 
.75 
.76 
.77 

.93 

.82 
.88 
.87 

.96 
1.14 
1.19 
1.10 

.90 
.90 
.94 
.91 

.81 
.81 

.85 
.82 

Corrected 

Average 

.69 

.76 

.82 

1.04 

.69 

.78 

.99 

.82 

1  Thorndike,  E.  L.,  Mental  and  Social  Measurements,  p.  220. 
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Inspection  of  this  table  gives  an  answer  to  the  question  raised 
above,  namely:  What  is  the  amount  of  the  individual's  varia- 
bility among  the  different  tests?  Measured  in  terms  of  Q  the 
average  individual  variability  is  82  per  cent  of  the  variability 
of  the  group.  That  is,  the  average  semi-interquartile-range  of 
the  individual  is  82  per  cent  of  the  average  semi-interquartile- 
range  of  the  group.  The  average  range  in  ranks  of  the  ninety- 
seven  pupils  studied  in  the  preliminary  investigation  and  given 
in  Section  IT  was  found  to  be  82  per  cent  of  the  total  range  pos- 
sible. These  two  figures  supplementing  each  other  as  they  do, 
are  convincing  evidence  of  the  large  amount  of  variability  among 
the  achievements  of  these  pupils  in  the  different  tests.  This 
variability  is  evidence  of  the  unreliability  of  one  test  or  a  small 
number  of  tests  used  for  the  purpose  of  educational  prognosis. 

The  table  shows  further  that  the  individual  variability  is 
greater  in  the  third  testing  than  in  either  of  the  first  two,  but 
not  enough  greater  to  be  of  significance.  The  difference  in  vari- 
ability among  the  different  divisions  of  the  group  as  ranked  by 
median  achievement  is  consistent  enough  and  large  enough  to 
be  significant.  When  grouped  either  in  quartiles  or  tertiles  the 
lower  ranking  pupils  are  found  to  be  more  variable  in  their 
achievements.  The  corrected  averages  show  that  the  variability 
of  the  fourth  quartile  is  50  per  cent  greater  than  the  variability 
of  the  first ;  and  that  the  variability  of  the  third  tertile  is  43  per 
cent  greater  than  that  of  the  first.  The  fourth  quartile  exceeds 
the  standard  adopted  and  the  third  tertile  almost  equals  it. 

TABLE  XII 
Distribution  of  the  Individual  Semi-Interquabtile-Ranges 
(Approximation)    in  the  Eleven  Tests 


Feb. 

Feb. 

June 

Value  in  Q 

1916 

1917 

1917 

2.0  to  2.4 

2 

1.5  to  1.9 

3 

5 

4 

1.0  to  1.4 

22 

20 

25 

.5  to  .9 

43 

41 

34 

.0  to  .4 

4 

6 

7 

Table  XII  gives  the  distribution  of  the  individual  semi-inter- 
quartile-ranges (approximation)  for  the  three  testings.  It 
shows  a  slight  increase  in  individual  variability  the  longer  the 
pupils  remain  in  school.  It  means  that  there  is  a  slightly  greater 
range  in  the  achievements  of  the  individual  pupils  in  the  dif- 
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ferent  tests.     However,  this  difference  is  not  great  enough  to 
base  any  conclusions  upon  it. 

The  amount  of  individual  variability  can  be  further  meas- 
ured by  finding  the  total  range,  tiie  range  above  the  median,  and 
the  range  below  the  median  for  each  individual  in  his  different 
achievements.     The  results  are  given  in  Table  XIII. 


TABLE  XIII 

Averages  in  Connection   With   Individual  Ranges  in   Scores 
Transmuted  Into  Multiples  of  Q 

The    table    reads    as    follows:     In    February,    1916,    the    average    total 
range  of  the  pupils  in  the  quartile  ranking  highest  was  4.14  Q,  etc. 


( A )     Average 

Total 

Range  in  the  Eleven  Tests 

Quartile 

Tertile 

Total 

I 

II 

III 

IV 

I 

II 

III 

February  1916 
February  1917 
June  1917 
Average 

4.14 
4.71 
4.21 
4.35 

4.14 
4.69 
3.97 
4.27 

5.39 
4.43 
4.59 
4.80 

5.04 
5.68 
6.42 
5.71 

4.03 
4.65 
4.05 
4.24 

5.15 
4.45 
4.55 
4.72 

4.85 
5.52 
5.79 
5.39 

4.68 
4.88 
4.87 
4.78 

(B)     Average  Range  Above  Individual  Medians  in  the  Eleven  Tests 


Quartile 

Tertile 

Total 

I 

II 

III 

IV 

I 

II 

III 

February  1916 
February  1917 
June  1917 
Average 

1.70 
1.48 
1.73 
1.64 

2.07 
1.78 
1.79 

1.88 

2.00 
1.75 
1.98 
1.91 

2.14 
2.18 
2.65 
2.32 

1.72 
1.49 
1.73 
1.65 

2.16 
1.89 
1.83 
1.96 

2.05 
2.02 
2.55 
2.21 

1.98 
1.80 
2.04 
1.94 

(C)     Average  Range  Below  Individual  Medians  in  the  Eleven  Tests 


Quartile 

Tertile 

Total 

I 

II 

III 

IV 

I 

II 

/// 

February  1916 
February  1917 
June   10'l7 

Average 

2.44 
3.22 
2.47 
2.71 

2.07 
2.92 
2.18 
2.39 

3.39 

2.68 
2.61 
2.89 

2.90 
3.49 
3.77 
3.39 

2.31 
3.16 
2.32 
2.60 

2.99 
2.56 
2.72 
2.76 

2.80 
3.50 
3.23 
3.18 

2.70 
3.08 
2.74 

2.84 
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(D)     Average  Total  Range  in  Certain  Tests  Combined 


Quartile 

Tertile 

Total 

/ 

II 

III 

IV 

I 

II 

III 

Eiglit  Trabue 

3.57 

3.57 

3.56 

3.18 

3.69 

3.40 

3.33 

3.47 

Six  Mathematics 

2.84 

3.43 

2.86 

3.81 

2.70 

3.38 

3.62 

3.23 

Five  Directions 

1.91 

2.33 

2.66 

2.91 

2.11 

2.20 

3.05 

2.45 

Three  Reading 

1.59 

1.74 

1.62 

2.45 

1.73 

1.57 

2.25 

1.85 

Average 

2.48 

2.77 

2.68 

3.09 

2.66 

2.64 

3.06 

2.75 

(E)     Average  Interquartile  Range    (Approximation)    in  the  Eleven 
Tests 


Quartile 

Tertile 

Total 

I 

II 

III 

IV 

I 

II 

III 

February  1916 
February  1917 
June  1917 
Average 

1.59 
1.42 
1.57 
1.53 

1.79 
1.71 
1.50 
1.67 

1.81 
1.73 
1.88 
1.81 

1.97 
2.34 
2.57 
2.29 

1.60 
1.50 
1.52 
1.54 

1.85 
1.63 
1.75 
1.74 

1.92 

2.27 
2.37 
2.19 

1.79 
1.80 

1.88 
1.82 

Although  the  range  is  not  so  reliable  as  other  measures  of 
variability,  still  some  deductions  can  be  drawn  from  Table  XIII 
which  are  significant.  In  all  the  parts  of  this  table  the  fourth 
quartile  shows  consistently  a  marked  increase  in  variability 
over  the  first,  and  likewise,  the  third  tertile  over  the  first,  except 
in  the  case  of  the  eight  Trabue  tests.  This  shows  that  among 
different  abilities  and  in  the  same  ability  the  range  of  achieve- 
ments of  the  low  ranking  pupils  is  greater  than  that  of  those 
ranking  high.  Is  this  because  of  the  poor  showing,  oftentimes 
almost  absolute  failure,  they  make  in  some  tests?  Parts  B  and 
C  of  Table  XIII  bear  specifically  upon  this  question.  The  pupils 
ranking  low  consistently  have  a  greater  range  above  their  med- 
ian achievement  than  do  the  pupils  ranking  high.  Table  XXIII 
shows  that  for  all  three  testings  there  were  twenty  very  low 
scores,  almost  absolute  failures,  in  the  highest  tertile,  and  thirty- 
six  very  low  scores  in  the  lowest  tertile.  The  difference  between 
these  two  numbers  is  not  sufficient  to  account  for  the  greater 
range  in  ability  on  the  part  of  the  duller  pupils.  The  results 
tend  to  show  that  the  greater  variability  of  the  low  ranking 
pupils  is  due  to  some  factor  inherent  in  the  nature  of  their  work. 

Parts  A,  B,  and  C  of  Table  XIII  should  be  compared  with 
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the  first  three  columns  of  Table  IV,  which  show  the  greatest 
total  range  of  achievements  in  the  second  and  third  (luartiles, 
and  also  a  smaller  range  below  the  median  in  the  fourth  quar- 
tile  than  below  the  median  in  the  first  quartile.  The  data  of 
Table  XIII,  as  has  already  been  pointed  out,  are  more  reliable 
than  those  of  Table  IV.  The  results  shown  in  Table  XIII  will 
be  discussed  further  in  connection  with  the  results  of  Tables 
XIV,  XVIII,  and  XIX. 

TABLE  XIV 

Comparison  of  the  Variability  of  Individual  Ranges  in  the  Eleven 

Tests  by  Quartiles  and  by  Tertiles 

The  per  cents  given  are  computed  from  the  average  of  tlie  average  ranges 

in    the   three   testings, — *  Except   the   average    of   the    Eight   Trabue,    Six 

Mathematics,  Five  Directions,  and  Three  Reading  Tests. 

The    table   reads    as   foUotcs:     In    the    eleven    tests   the    total   range   of 
Quartile  II  is  98  per  cent  of  Quartile  I,  etc. 


Per  Cent  Which  the  Variability  of  Each  Quartile  and  Each 
Tertile  is  of  Those  Higher.     (I  is  considered  highest) 

1 

2 

3 

4 

5 

6 

7 

S 

9 

Qua 

rtile 

TeHile 

According  to 
Values  of  Q 

II 

of 

I 

III 

of 
I 

IV 

of 
I 

III 
of 
II 

IV 
of 
II 

IV 
of 
III 

II 

of 

I 

III 

of 

I 

III 

of 

II 

Total  Range 

.98 

1.10 

1.31 

1.12 

1.34 

1.19 

1.11 

1.27 

1.14 

Range  Above 
Median 

1.15 

1.17 

1.42 

1.02 

1.23 

1.21 

1.19 

1.34 

1.13 

Range  Below 
Median 

.88 

1.07 

1.25 

1.21 

1.42 

1.17 

1.06 

1.22 

1.15 

•Average 
Range  of 
Four  Groups 

1.12 

1.08 

1.25 

.97 

1.12 

I.IH 

1.03 

1.20 

1.16 

Inter-Quartile 

Range 

(Approximation) 

1 
1.09 

1.18 

1.50 

1.08 

1.37 

1.27 

1.13 

1.42 

1.26 

The  data  of  Table  XIV  are  computed  from  the  averages  of 
Table  XIII.  This  table  summarizes  the  evidence  on  the  ques- 
tion as  to  whether  the  duller  pupils  or  the  brighter  pupils  have 
the  greater  range  in  their  achievements.  Column  3  shows  that 
on  an  average  the  lowest  quartile  had  a  range  above  the  individ- 
ual medians  42  per  cent  greater  than  the  first  quartile.  Fur- 
ther, the  range  of  the  fourth  quartile  below  the  individual  med- 


36  Educational  Diagnosis  of  Individual  Pupils 

ians  was  only  25  per  cent  greater  than  that  of  quartile  one. 
However,  these  two  figures,  42  and  25,  should  not  be  compared 
at  face  value.  By  the  range  of  the  tests  used  and  the  placement 
of  median  ability  above  the  median  of  the  range  of  the  tests 
the  possibility  of  large  ranges  above  their  medians  was  limited 
for  the  pupils  ranking  high,  while  in  all  other  cases  the  range 
of  the  tests  was  sufficient  to  allow  for  the  maximum  individual 
range  in  either  direction  from  the  individual  median.  This 
would  tend  to  make  the  42  per  cent  increase  in  the  range  of  the 
fourth  quartile  over  the  first  somewhat  greater  than  it  should  be. 

The  number  of  pupils  making  the  highest  score  possible  in 
the  different  tests  shows  that  the  range  of  ability  covered  was 
not  an  important  factor  in  limiting  the  variability  of  the  pupils 
ranking  highest.  In  21  of  the  33  tests  the  highest  score  possible 
was  reached  by  none  of  the  pupils;  in  9  it  was  made  by  a  rela- 
tively small  number;  and  in  only  3  tests  was  the  highest  score 
possible  made  by  a  relatively  large  number  of  the  pupils.  That 
the  range  of  ability  covered  by  the  tests  was  not  an  important 
factor  in  limiting  the  variability  of  the  highest  ranking  pupils 
is  shown  further  by  certain  results  in  Table  XIV.  The  per- 
centage of  increase  of  the  lower  quartiles  over  the  higher  quar- 
tiles  is  as  great  in  the  case  of  the  approximation  of  the  inter- 
quartile range  as  in  the  total  range.  If  the  range  of  ability  of 
the  tests  had  been  operative  to  any  great  extent  it  should  have 
affected  the  total  range  of  variability  to  a  noticeably  greater  ex- 
tent than  the  interquartile  range. 

Another  point  should  be  mentioned  in  this  connection.  In 
the  range  below  the  median  there  is  undoubtedly  a  factor  which 
is  not  present  in  the  range  above.  Low  scores  in  these  tests 
are  sometimes  caused  by  external  conditions, — chance  occurrences 
such  as  the  dropping  of  a  pencil,  becoming  amused  at  some  part 
of  a  rate  test,  etc.,  while  high  scores  are  not  so  caused.  High 
scores  are  the  result  of  ability ;  low  scores  are  the  result  of  either 
less  ability  or  the  failure  of  ability  to  function  due  to  various 
causes.  Thus  in  both  cases  the  lower  range  is  increased  by  this 
second  factor  which  tends  to  make  the  percentage  of  increase 
in  variability  lower. 

Allowing  for  these  corrections  the  results  seem  to  show  that  the 
low  ranking  pupils  of  this  group  are  inherently  more  variable  in 
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their  achievement  than  the  pupils  ranking  high.  As  to  the  reason 
for  the  one  exception  suggested  above,  namely,  the  eight  Trabue 
tests  combined  in  which  the  low  ranking  pupils  are  least  variable, 
the  investigation  offers  no  evidence.  It  may  be  that  the  tests 
are  better  standardized,  or  that  the  ability  required  for  these 
tests  is  more  specific,  or  there  may  be  some  other  reason  for  the  re- 
sults. It  should  be  observed,  however,  that  two  problems  are  in- 
volved in  this  connection.  One  is  the  variability  of  individuals 
among  different  abilities  and  the  other  is  the  variability  of  indi- 
viduals in  different  testings  of  the  same  ability.  A  large  amount 
of  variability  among  several  traits  or  abilities  does  not  neces- 
sarily imply  great  variability  among  several  tests  of  the  same 
trait.  The  Trabue  scales  test  a  single  trait  while  the  eleven 
tests  cover  several  traits.  Ability  in  each  trait  may  remain  about 
the  same  relatively  from  one  testing  to  another  and  still  there 
may  be  great  variability  among  the  several  tests. 

The  amount  of  variability  among  the  different  tests  cannot 
be  compared  directly  with  the  amount  among  combined  similar 
tests,  shown  in  Section  D  of  Table  XIII,  because  in  each  case 
the  number  of  tests  combined  is  different  from  the  number  of 
different  tests  ^  in  Section  A  of  the  table.  This  could  be  ac- 
complished by  some  method  of  weighting  but  such  will  not  be 
attempted  here. 

2.  Distribution  of  Individual  Variability 
Several  measures  of  the  amount  of  individual  variability  have 
been  found  by  taking  different  single  measures  of  the  range 
in  achievement.  The  distribution  of  all  the  scores  above  and 
below  the  individual  medians  will  throw  more  light  upon  this 
problem.  Such  distributions  for  the  average  of  the  three  test- 
ings by  tertiles  and  for  the  entire  group  by  each  testing  are 
given  in  Table  XV.  The  frequencies  here  are  expressed  in  per 
cents  so  that  the  distributions  for  the  eleven  different  tests  may 
be  compared  later  with  the  distributions  for  the  combined  sim- 
ilar tests.  Figs.  16a  to  17c  show  graphically  the  data  in  this 
table. 

2  These  obviously  are  not  all  different  tests  in  the  sense  of  testin<^  strictly 
different  abilities.  The  two  Trabue  tests  are  of  course  for  the  same  ability 
and  the  mathematics  tests  are  for  rather  closely  related  abilities.  The  use 
of  the  phrase  "eleven  different  tests"  will  be  continued  in  the  study  with 
this  limitation  understood. 
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TABLE  XV 

DlSTBIBUTIOX    (IN  PeB  CeNTS  )   OF  SCOBES  ABOVE  AND  BeLOW  THE  INDIVIDUAL 

Medians  in  the  Eleven  Tests  Tbansmuted  Into  Multiples 
OF  Q  by  the  Obiginal  Distbibutions 
The   Measure   of   Central   Tendency   is   the   Median   of   the   Individual's 
Scores  Transmuted  into  Multiples  of  Q. 


Average  of 

the 

Three  Testings 

(Feb. 

1916, 

,  Feb. 

1917 

,  June 

Entire  Number 

1917) 

by  Tertiles: 

by  Testings: 

Feb. 

Feb. 

June 

Value  ■ 

in  Q 

/ 

II 

III 

1916 

1917 

1917 

+5.0  to 

.1 

.1 

+4.5  to 

+4.9 

.1 

.1 

+4.0  to 

+4.4 

.3 

.1 

.1 

.3 

+3.5  to 

+3.9 

.1 

.1 

.4 

.3 

.4 

+3.0  to 

+3.4 

.4 

.9 

1.4 

.9 

1.1 

.6 

+2.5  to 

+2.9 

.9 

1.0 

1.1 

1.4 

.8 

.9 

+2.0  to 

+2.4 

1.5 

3.0 

3.7 

2.5 

2.7 

3.0 

+  1.5  to 

+  1.9 

3.4 

4.0 

7.3 

5.8 

4.4 

4.5 

+  1.0  to 

+  1.4 

6.7 

9.5 

10.6 

9.2 

8.4 

9.1 

+  .5  to 

+  .9 
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Figs.  16a,  16b,  and  16c  represent  the  distributions  of  the 
scores  of  Tertiles  I,  II,  and  III  respectively.  From  the  form 
of  these  curves  it  is  evident  that  in  the  total  distribution  of  their 
scores  the  pupils  ranking  lowest,  those  of  the  third  tertile,  are 
most  variable  in  their  achievements.  The  mode  of  the  third  ter- 
tile is  not  so  pronounced  as  that  of  the  first  tertile.  The  range 
above  the  median  of  the  third  is  greater  than  that  of  the  first, 
and  the  range  below  the  median  of  the  third  shows  more  ex- 
treme cases.  All  three  curves  tend  to  bring  out  the  difference 
between  the  distribution  of  scores  above  the  median  and  the 
distribution  below.  The  range  below  is  greater  and  more  reg- 
ular in  its  decline.  The  two  halves  of  the  curves  show  one  sim- 
ilarity which  is  spurious.  All  the  scores  of  5Q  or  more  are 
grouped  into  the  last  frequency  because  of  the  extreme  range 
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Figs.  16a  to  17c.  Distribution  (in  Per  Cents)  of  Scores  Above  and  Below 
the  Individual  Medians  in  the  Eleven  Tests  Transmuted  into  Multiples  of 
Q  by  the  Original  Distributions. 


Figs.  16a,  b,  c.      Averages  of  the 

Three  Testings  by  Tertiles 
Fig.   16a.      Tertile   I 

"     16b.  "       II 

"     16c.  "     III 


Figs.    17a,   b, 
by  Testings 

Fig.  17a.  Feb.  1916 
"  17b.  Feb.  1917 
"  17c.  June  1917 


Entire  Number 


in  the  graph  which  a  few  of  the  scores  would  have  necessitated, 
— 18.7Q  in  one  case.  This  suggests  a  relation  between  the  high 
and  low  scores  of  extreme  variability  which  does  not  exist. 

Figs.  17a,  17b,  and  17c  represent  the  distribution  of  scores 
above  and  below  the  individual  medians  in  February,  1916, 
February,  1917,  and  June,  1917  respectively.  Their  significance 
is  in  their  similarity.  There  is  only  one  point  of  difference  to 
note.  It  is  the  greater  length  of  the  curve  in  Fig.  17c  when  it 
nears  the  base  line.  The  increase  is  not  enough  to  be  especially 
significant,  and,  moreover,  it  is  in  that  part  of  the  curve  which  is 
least  reliable.  However,  it  is  in  accord  with  the  slight  increase 
which  was  found  in  the  Q  of  the  third  testing,  and  tends  to 
show  that  these  pupils  became  more  variable  in  their  own  achieve- 
ments the  longer  they  remained  in  school. 
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TABLE  XVI 

Distribution  (in  Pek  Ck.nts)  of  Scores  Above  and  Below  the  Individual 
Medians  in  Certain  Tests  Transmuted  Into  Multiples 
OF  Q  BY  XHE  Original  Distributions 
The  Measure  of  Central  Tendency  is  the  Median  of  the  Individual's  Scores 
Transmuted  into  Multiples  of  Q. 
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19.2 

18.5 
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—  .5  to  —  .9 

12.8 
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10.3 

8.8 

—1.0  to  —1.4 

9.0 

6.5 

9.4 
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5.1 

3.5 

4.7 

3.2 
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1.2 

3.0 

2.2 

4.6 

—2.5  to  —2.9 

1.6 

1.8 

.6 

.5 

—3.0  to  —3.4 

.7 

.9 

.8 

.5 

—3.5  to  —3.9 

.3 

.5 

—4.0  to  —4.4 

.2 

.5 

—4.5  to  —4.9 

.4 

.5 

.3 

—5.0  to    

.5 

.3 

Table  XVT  gives  data  for  the  four  groups  of  combined  tests 
which  are  similar  to  the  data  of  Table  XV  for  the  eleven  dif- 
ferent tests.  The  frequencies  of  scores  above  and  below  the  in- 
dividual medians  are  expressed  in  percentages  of  the  total  num- 
ber of  scores  in  each  combined  group.  Figs.  18  to  21  represent 
graphically  the  distributions  of  this  table.  Fig.  17b,  represent- 
ing the  distribution  for  the  eleven  tests  in  February,  1917  is  re- 
peated in  order  to  facilitate  comparison. 

These  figures  cannot  be  compared  directly  because,  as  has 
already  been  pointed  out,  the  number  of  tests  is  different  in 
each  case.  Expressing  the  frequencies  in  percentages  equates 
the  surfaces  of  distribution  and  permits  some  inferences  to  be 
drawn  concerning  the  general  shape  of  the  curves.  Figs.  18 
and  19  representing  the  eight  Trabue  tests  and  the  six  mathe- 
matics tests  are  strikingly  similar  to  Fig.  17b  which  represents 
the  eleven  tests.  They  do  not  show  as  much  variability  among 
the  achievements  of  the  individual  as  in  the  case  of  the  eleven 


Individual  Variability  Compared  with  Group  Variability  41 


^^r-- — r^ I n^-,,        ^,.    1 

Figs.  18  to  21.  Distribution  (in  Per  Cents)  of  Scores  Above  and  Below 
the  Individual  Medians  in  Certain  Tests  Transmuted  into  Multiples  of  Q 
by  the  Original  Distributions. 

Fig.   18.     Eight  Trabue  Tests  Com-  Pig.   20.      Five  Directions  Tests  Com- 

bined bined 

Fig.    19._     Six     Mathematics     Tests  Fig.   21.     Three   Reading  Tests    Corn- 

Combined  bined 

Fig.    17b.   Feb.    1917    Testing    (Re- 
peated) 

tests  but  they  show  a  rather  surprisingly  large  amount  of  vari- 
ability. The  mode  is  more  pronounced  in  that  the  width  of  great 
density  is  larger.  The  extent  of  the  curves  and  their  shape 
near  the  base  line  are  quite  similar.  The  curves  of  Figs.  20 
and  21  representing  the  five  directions  tests  and  the  three  read- 
ing tests  differ  from  the  others  rather  markedly.  The  smaller 
number  of  tests  is  probably  a  very  potent  reason  for  this,  es- 
pecially in  the  latter  case. 

TABLE  XVII 

The  Q  of  the  Distbibution  op  Scores  Above  and  Below  the  Individual 

Medians  fob  the  Thbee  Testings  and  fob  Certain  Tests  Combined 

Tests  Q 

February,  1916  81 

February,  1917  79 

June,  1917  80 

Eight  Trabue 73 

Six  Mathematics 71 

Five  Directions 58 

Tiiree  Reading 46 
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The  Q's  of  Table  XVII  were  calculated  from  the  distribution 
of  scores  above  and  below  the  individual  medians  shown  in 
Tables  XXXVII  and  XXXVIII  in  the  Appendix.  They  should 
be  compared  with  l.OOQ,  the  variability  of  the  group  used  as  the 
standard.  The  results  here  are  slightly  smaller  than  those  of 
Table  XI  because  the  scores  of  the  less  variable  pupils  beyond 
the  Q,  but  still  less  than  the  Q  of  the  more  variable  pupils,  re- 
duce the  size  of  the  Q  in  the  total  distribution. 

TABLE  XVIII 

DiSTBIBUTION    BY   TeRTILES   OF    RANGES    AbOVE   AND   BeLOW    THE   INDIVIDUAL 

Medians  in  the  Eleven  Tests  in  Values  of  Q 
The  Measure  of  Central  Tendency  is  the  Median  of  the  Individual's  Scores 
Transmuted  into  Multiples  of  Q. 
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TABLE  XIX 
Distribution  by  Tebtiles  of  Ranges  Above  and  Below  the  Individual 
Medians  in  Certain  I^sts  in  Values  of  Q 
The  Measure  of  Central  Tendency  is  the  Median  of  the  Individual's  Scores 
Transmuted  into  Multiples  of  Q. 
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Tables  XVIII  and  XIX  and  Figs.  22a  to  25c  are  introduced  to 
supplement  the  data  given  in  Tables  XIII  and  XIV.  "Nothing 
short  of  the  entire  distribution  table  is  a  complete  measure  of 
a  variable  fact,  .  .  . "  ^  The  first  nine  columns  of  Table 
XVIII  are  not  separately  represented  graphically.  The  last 
three  columns  are  the  averages  of  the  respective  tertiles  for  the 
three  testings.  Figs.  22a,  b,  and  c  show  these  averages  for  the 
three  tertiles  of  the  group.     The  curves,  of  course,  are  bimodal 

3  Thorndike,  E.  L.,  Mental  and  Social  Measurements,  p.  36. 
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Figs.  22a  to  23c.  Distribution  by  Tertiles  of  Ranges  iVbove  and  Below 
the  Individual  Medians  in  the  Eleven  Tests  and  in  the  Eight  Trabue  Tests 
in  Values  of  Q.  Tlie  Measure  of  Central  Tendency  is  the  Median  of  the 
Individual's  Scores  Transmuted  into  Multiples  of  Q. 


Average  of  Three  Testings 
Fig.   22a.      Ranges  in  Tertile  I 
"     22b.           "        "       "       II 
"     22c.  Ill 


Eight  Trabue  Tests 

Fig.  23a.  Ranges  in  Tertile  I 
"     23b.  "        "       "       II 

"     23c.  "        "       "III 


because  they  represent  two  variables,  the  ranges  above  and  be- 
low the  median.  They  are  joined  to  show  the  increase  in  the 
extent  of  the  ranges  of  the  pupils  of  the  third  tertile  over  those 
of  tertiles  two  and  one.  The  curves  show  the  greater  range  of 
the  extreme  scores  of  the  third  tertile  both  above  and  below  the 
median  achievement,  being  especially  true  of  the  range  below  the 
median.  Here  again  the  curves  are  lopped  off  at  lOQ  and  more 
minus.  Finally,  these  curves  show  one  point  upon  which  Tables 
XIII  and  XIV  do  not  give  definite  evidence.  The  25  per  cent 
increase  below  the  median  in  the  range  of  tertile  three  over  ter- 
tile one  is  not  accounted  for  chiefly  by  a  very  few  extremely 
variant  ranges  but  by  the  greater  variability  of  this  tertile  in 
general.  Further,  the  few  extreme  ranges  above  the  median 
count  still  less  in  effecting  the  45  per  cent  increase  in  the  range 
of  the  third  tertile  above. 
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Figs.  24a  to  25c.     Distribution  by  Tertiles  of  Ranges  Aho\e  and  Below 
the  Individual  Medians  in  the  Six  Mathematics  and  Five  Directions  Tests. 

Six  Mathematics  Tests  Five   Directions   Tests 

Fig.  24a.     Ranges  in  Tertile  I  Fig.   25a.      Ranges  in  Tertile  I 

"       "       II  "     25b.  "        "       "       II 


24b. 
24c. 


"        "     III  "     25c.  "         "        "III 

Figs.  23a,  b,  and  c  represent  similar  data  for  the  eight  Trabue 
tests;  Figs.  24a,  b,  c  such  data  for  the  six  mathematics  tests; 
and  Figs.  25a,  b,  c  such  data  for  the  five  directions  tests.  The 
same  number  of  cases,  twenty-four  above  and  twenty-four  below 
the  median,  is  represented  by  the  surface  of  each  graph.  The 
figures  for  the  combined  tests  disclose  fewer  extremely  variant 
ranges.  Figs.  23a  and  23c  show  that  the  exception  to  the  in- 
crease in  the  range  of  the  pupils  ranking  low  over  those  ranking 
high,  namely,  in  the  eight  Trabue  tests,  is  not  the  result  of  a 
few  extremely  variant  ranges  in  the  first  tertile,  but  an  inherent 
result  of  the  form  of  distribution. 

The  measures  of  extreme  variability  are  emphasized  not  be- 
cause they  are  thought  to  have  ordinarily  more  significance  than 
measures  of  variability  near  the  central  tendency  but  because 
it  is  one  of  the  chief  purposes  of  this  investigation  to  study  the 
extremely  variant  achievements. 

The  results  of  this  topic  and  of  the  preceding  one  also  tend 
to  show  that  the  pupils  ranking  lowest  are  most  variable.  Be- 
fore leaving  the  topic  further  comparison  of  these  results  should 
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be  made  with  the  results  obtained  by  ranks  and  given  in  Table 
IV.  The  data  of  Table  IV  show  that  the  pupils  ranking  lowest 
are  no  more  variable  than  are  the  pupils  ranking  highest,  and 
that  the  pupils  ranking  nearer  the  median  are  the  most  vari- 
able ones.  The  average  range  of  (luartiles  two  and  three  is 
shown  to  be  16  per  cent  greater  than  the  average  range  of  quar- 
tiles  one  and  four.  The  range  of  the  fourth  quartile  below  the 
median  is  shown  to  be  less  than  half  as  great  as  the  range  of  the 
first  quartile  below  the  median,  while  by  the  classification  by 
the  Q  variability  it  is  shown  to  be  25  per  cent  greater  than  the 
range  of  the  first.  Likewise  with  the  rest  of  the  results  of 
this  table. 

Another  point  should  be  noted  in  this  connection.  Among 
this  group  of  pupils  there  are  no  such  types  or  pronounced  ex- 
tremes as  are  represented  by  Fig.  1,  constructed  from  the  classi- 
fication by  ranks.  The  piling  up  of  scores  at  each  end  of  the 
range  as  shown  in  Cases  60  and  92  is  a  spurious  result  of  the 
method  caused  by  the  failure  to  retain  the  relative  proportions 
of  the  original  distributions. 

This  comparison  is  additional  evidence  that  the  method  of 
evaluating  achievements  in  terms  of  ranks  from  the  highest  to 
the  lowest  in  the  group  does  not  produce  as  reliable  results  in 
connection  with  the  different  achievements  as  does  the  method 
used  in  this  investigation. 

3.     Overlapping  op  Divisions  op  the  Group 

There  is  another  question  in  connection  with  this  part  of  the 
problem  of  variability  of  the  individual  that  should  be  asked 
concerning  relative  variability.  Having  single  measures  of  the 
individual's  variability  and  having  the  distribution  of  all  his 
scores,  the  difference  in  ability  of  the  different  individuals  should 
be  known.  Do  the  pupils  who  rank  low  and  vary  more  in  their 
achievements  than  the  pupils  who  rank  high,  differ  from  those 
ranking  high  only  a  little  in  ability  or  do  they  differ  a  great  deal  ? 
This  difference  can  be  measured  by  the  per  cent  of  overlapping 
of  the  scores  among  the  different  divisions  of  the  group.  Table 
XX  gives  the  amount  of  overlapping  of  each  quartile  over  the 
other  according  to  three  different  points  of  reference, — the 
median,  twenty-five  percentile,  and  seventy-five  percentile. 
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The  three  different  testings,  February  1916,  February  1917, 

and  June  1917,  of  Table  XX  are  not  sufficiently  differentiated 

one  from  another  to  offer  any  points  for  special  notice.     Their 

overlappings  are  very  similar.     Consequently  the  average  may 

be  considered  as  typical  of  the  three.     The  correspondence  not 

the  variation  is  the  striking  point  of  the  overlappings  revealed 

by  the  averages.     The  per  cents  of  scores  of  Quartiles  IV,  III, 

and  II  that  exceed  the  seventy-five  percentile  of  Quartile  I  show 

a  very  close  agreement  with  the  per  cents  of  Quartiles  I,  II,  and 

III  that  extend  below  the  twenty-five  percentile  of  Quartile  IV. 

The  comparisons  are :  4.7  with  3.5,  6.3  with  6.1,  and  10.3  with 

10.3     Similar   comparisons   using   the   per   cents   of   the   same 

quartiles  above  the  median  of  the  first  and  below  the  median 

of  the  fourth  give:  9.1  with  9.3,  18.6  with  14,1,  and  24.7  with 

26.9.     Other  comparisons  that  might  be  made  would  disclose 

about  the  same  agreement.     This  shows  that  the  high  scores  of 

the  low  pupils  overlap  the  high  scores  of  the  high  pupils  to  an 

extent  that  corresponds  very  closely  with  the  overlapping  of  the 

low  scores  of  the  high  pupils  over  the  low  scores  of  the  low 

pupils, 

TABLE  XXI 

Difference  in  Achievement  Between   Quartiles  Measured  in  Terms 
OF  THE  Q  Variability  of  the  Group 

The  table  reads:  Bettceen  the  medians  of  Quartiles  I  and  II  there  were 
22.2  per  cent  of  the  scores  of  Quartile  I  and  25.3  per  cent  of  the  scores  of 
Quartile  II,  etc. 


Per  Cent   of  Scores  Be- 

tween the  Medians  of 

Quartiles: 

I 

II 

/// 

and 

and 

and 

II 

III 

IV 

22.2 

18.2 

23.1 

25.3 

15.5 

19.9 

23.8 

16.9 

21.5 

.95 

.65 

.84 

Per  cent  of  higher  quartile  overlapping  lower.. 
Per  cent  of  lower  quartile  overlapping  higher . . 

Average  overlapping    23 

Value   in   Q    

The  results  given  in  Table  XXI  are  calculated  from  the  aver- 
ages in  Table  XX.  The  following  example  illustrates  the  method. 
In  Quartile  I,  27.8  per  cent  of  the  scores  are  below  the 
median  of  Quartile  II.  This  leaves  22.2  per  cent  of  the  scores 
of  Quartile  I  between  its  median  and  the  median  of  Quartile 
II,  etc.     The  values  in  Q  are  taken  from  a  table  of  "P.E.  Values 
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Corresponding  to  Given  Per  Cents  of  the  Normal  Surface  of 
Frequency,  Per  Cents  Being  Taken  from  the  Median. ' '  *  The 
results  show  a  slightly  greater  difference  between  the  first  and 
second  and  between  the  third  and  fourth  quartiles  than  between 
the  second  and  third.  In  median  achievement  the  fourth  quar- 
tile  is  2.44Q  below  the  first  quartile.  This  shows  that  the  vari- 
ability of  the  fourth  quartile  is  that  of  a  distinctly  lower  grade 
of  work. 

There  still  remains  another  question  of  interest  and  impor- 
tance, namely:  On  the  basis  of  individual  achievement  in  the 
eleven  tests  how  far  from  zero  ability  in  the  traits  measured 
are  the  different  divisions  of  this  group?  The  answer  to  this 
would  round  out  this  section  of  the  problem.  It  would  mean 
that  any  score  of  an  individual  could  be  related  not  only  to  his 
other  scores  and  to  the  scores  of  other  individuals  of  the  group, 
but  also  that  its  absolute  value  could  be  determined.  These  ab- 
solute values  could  be  determined  for  the  scores  of  the  tests 
that  have  been  built  by  scaling  achievements  from  the  zero  point, 
but  for  the  others  they  could  only  be  estimated.  Therefore,  this 
part  of  the  problem  will  have  to  be  left  unanswered.  This  serves 
to  emphasize  the  need  for  more  tests  scaled  from  zero  for  the 
problems  in  educational  diagnosis. 

For  the  purpose  of  individual  diagnosis  the  variability  of  a 
test  should  be  standardized  either  by  grade  or  by  age  of  the 
pupils.  Having  such  a  measure  the  scores  of  an  individual  could 
be  compared  by  transmuting  them  into  multiples  of  this  vari- 
ability without  the  labor  involved  in  this  investigation  of  deter- 
mining a  measure  of  variability  by  testing  a  group.  A  large 
number  of  cases  would  reduce  the  unreliability  of  the  measure 
of  variability  to  a  very  small  amount  and  would  make  it  possible 
to  secure  very  reliable  measures  of  the  relative  achievements 
of  the  individual. 

Questions  3,  4,  and  5  in  the  statement  of  the  problem,  con- 
cerning the  amount  of  the  individual's  variability,  the  distribu- 
tion of  individual  variability,  and  the  variability  of  bright, 
mediocre,  and  dull  pupils,  have  been  considered  in  this  section 
of  the  study.     The  average  amount  of  variability  of  the  sub- 

4  Trabue,  M.  R.,  Completion-Test  Language  Scales,  Teachers  College, 
Columbia  University,  Contributions  to  Education,  No.  77,  p.  3S. 
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jects  of  this  investigation  for  the  three  testings  in  the  eleven 
tests  used  has  been  found  to  be  eighty-two  per  cent  of  the  varia- 
bility of  the  group  in  the  same  tests.  The  variability  in  the 
last  testing  is  slightly  greater  than  in  the  first  testing.  The 
distribution  of  the  achievements  of  the  individual  approximates 
the  normal  surface  of  frequency,  the  chief  differences  being  a 
more  pronounced  mode  and  skewness  downward  from  the  me- 
dian. On  the  basis  of  three  equal  divisions  of  the  group  the 
bright  pupils  are  least  variable  and  the  dull  pupils  are  most 
variable  in  their  achievements.  The  distributions  of  the  achieve- 
ments for  all  three  divisions  have  the  same  general  form. 


VI 

EXTREME   VARIABILITY  IN   INDIVIDUAL   CASES 

1.     Extreme  Variability  in  Different  Tests 

Mention  has  already  been  made  of  certain  probable  causes  of 
low  scores,  such  as  distractions  of  the  moment  due  to  chance 
occurrences,  and  abnormal  mental  or  physical  condition  of  the 
individual  at  the  time  of  the  test.  It  has  also  been  pointed  out 
that  these  factors  are  not  effective  in  producing  high  scores,  or 
if  effective  at  all,  only  to  a  very  slight  extent  in  comparison 
vfith  their  effect  in  causing  low  scores.  The  effect  of  chance 
happenings  and  abnormal  conditions  upon  the  achievement  of 
the  pupil  can  be  ascertained  to  some  extent  by  classifying  the 
extremely  variable  or  erratic  scores  and  also  the  boj'-s  who  make 
them,  and  by  comparing  the  results  from  re-examination  under 
more  closely  controlled  conditions  with  the  original  achieve- 
ments. 

As  can  readily  be  seen  from  the  chart  in  Fig.  15  there  are  no 
distinct  types  of  scores  or  individuals.  Therefore  the  line  di- 
viding extreme  variability  from  the  rest  of  the  distribution  must 
be  arbitrarily  drawn.  A  distance  of  3Q  from  the  individual 
medians  was  chosen  for  the  location  of  this  line.  It  was  placed 
here  because  at  about  this  point  is  the  beginning  of  the  second 
slow  decrease  in  the  normal  curve  of  probability  as  characterized 
by  a  "slow-rapid-slow"  decline  in  either  direction  from  the 
median.  It  includes  47.8  per  cent  of  the  scores  on  either  side 
of  the  median.  Scores  3Q  or  more  from  the  median  in  each 
direction  will  be  called  erratic  either  plus  or  minus.  In  Table 
XXII  all  the  erratic  scores  in  the  three  testings  are  classified 
by  testing  for  each  test  and  by  total  and  average  for  each  test. 

The  totals  at  the  bottom  of  Table  XXII  show  an  increase  in 
the  number  of  erratic  or  extremely  variable  scores  both  plus  and 
minus  in  each  of  the  two  later  testings.  Other  things  being  equal 
this  would  show  that  the  individual's  abilities  to  achieve  in  these 
tests  had  increased  at  different  rates.  If  the  identical  tests  or 
tests  of  equated  values  had  been  used  for  the  second  and  third 
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TABLE  XXII 
Number  of  Scores  in  the  Different  Tests  3  Q  ok  Moke  Plus  or  Minus 


Scores  3  Q 
or  More 

Scores  3  Q 
or  More 

i 

Total  No. 
of  Scores 

Average  No. 
of  Scores 

Plus 

Minus 

3 

05 
1 

3Q 

or  More: 

3  Q  or  More. 

2 

g 

Test 

'I 

o 

1 

S 

as 

s 

2 
^ 

OS 

S 

Multiplica- 
tion 

1 

3 

-;!:• 

4 

2 

«• 

2 

4 

6 

10 

2.0 

3.0 

5.0 

Division 

1 

« 

1 

5 

a 

2 

1 

6 

7 

.5 

3.0 

3.5 

Algebra 
Add.  Subt. 

s 

0 

6 

9 

■:;:- 

2 

1 

6 

2 

8 

6.0 

2.0 

8.0 

Algebra 
Mult.  Div. 

■s 

«- 

1 

« 

o 

1 

1 

1 

1.0 

1.0 

Trabue 
(Both  tests) 

3 

3 

5 

2 

2 

8 

11 

4 

15 

1.8 

.7 

2.5 

Reading 

1 

1 

3 

3 

2 

3 

5 

.7 

1.0 

1.7 

Visual 
Vocabulary 

1 

1 

6 

3 

1 

7 

8 

.3 

2.3 

2.7 

Composition 

1 

3 

1 

3 

4 

1 

5 

1.3 

.3 

1.6 

Spelling 

6 

4 

7 

3 

17 

17 

5.6 

6.6 

Opposites 

2 

8 

1 

3 

11 

11 

3.7 

3.7 

Mixed 
Relations 

1 

4 

6 

3 

11 

11 

3.7 

3.7 

Easy 
Directions 

1 

4 

5 

3 

1 

9 

10 

.3 

3.0 

3.3 

Total 


8    11  12 


18 


27  32 


31 


77      108 


This  type  of  test  not  given. 
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testings  the  absolute  amount  of  gain  in  each  could  be  deter- 
mined. Such  evidence  would  be  more  reliable  than  the  evidence 
obtained,  v^^hich  measures  the  individual's  increase  in  ability  in 
relation  to  the  rest  of  the  group. 

This  increase  in  the  number  of  erratic  scores  might  also  be 
accounted  for  by  tiie  piling  up  of  scores  at  the  mode  to  a  greater 
extent  in  the  later  testings  thus  reducing  the  extent  of  the  Q 
and  thereby  increasing  the  transmuted  value  of  a  deviation  of 
the  same  absolute  amount  in  all  three  testings.  Table  IX  con- 
tains some  evidence  on  this  point,  but  not  enough  to  decide  it 
either  way. 

In  the  composition  tests  the  Q  in  terms  of  the  same  scale  is 
smaller  in  the  last  two  testings  than  in  the  first.  In  the  first 
testing  there  are  two  erratic  scores,  one  plus  and  one  minus; 
in  the  second  testing  there  are  three  erratic  scores;  and  in  the 
third  testing  there  are  no  erratic  scores.  In  reading,  the  iden- 
tical test,  Alpha  2,  Part  II,  was  repeated  in  the  second  testing. 
The  Q  is  slightly  larger  in  the  second  testing  than  in  the  first 
testing  and  the  number  of  erratic  scores  is  the  same.  The 
reading  test  of  the  third  testing  was  composed  of  different  se- 
lections and  therefore  the  Q  can  not  be  compared  with  the  Q 
of  Alpha  2.  The  number  of  words  in  the  spelling  tests  was  the 
same  throughout.  The  first  test  was  the  easiest  and  has  a  small- 
er Q  than  the  last,  showing  a  greater  piling  up  of  scores,  but  still, 
this  Q  which  is  much  less  than  that  of  the  last  test  lacks  one 
of  producing  as  many  erratic  scores  as  there  are  in  the  last  test. 
Opposed  to  these  results  the  smaller  Q  of  the  second  opposites 
test  produces  decidedly  more  erratic  scores  than  the  larger  Q's 
of  the  other  two  tests.  Other  examples  could  be  cited  showing 
either  result. 

The  results  of  Tables  IX  and  XXII,  in  so  far  as  they  bear  on 
this  question,  show  that  there  was  no  marked  reduction  of  vari- 
ability caused  bj^  the  repetition  of  the  tests  and  therefore  that 
the  increase  in  the  number  of  erratic  or  extremely  variable  scores 
in  the  later  testings  was  not  caused  to  a  large  extent  by  smaller 
Q's. 

Table  XXII  shows  that  in  all  three  testings  there  were  108 
erratic  scores.  Of  these  29  per  cent  were  plus  and  71  per  cent 
were  minus.     This  gives  an  average  number  of  erratic  scores 
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in  each  testing  which  is  4.5  per  cent  of  the  total  number  in  each 
testing.  That  is,  of  every  hundred  scores  four  and  one  half 
were  3Q  or  more  from  the  individual  medians.  It  shows  that 
the  curves  were  skewed  downward  for  in  the  normal  surface  of 
frequency  only  2.2  per  cent  of  items  are  beyond  3Q. 

The  last  three  columns  of  Table  XXII  give  the  average  num- 
ber of  scores  plus,  minus,  and  plus  and  minus  in  each  group  of 
closely  related  tests.  The  first  two  of  the  three  columns  are 
the  more  significant.  They  show  that  in  the  rate  tests  practic- 
ally all  the  erratic  scores  are  minus.  It  would  be  expected  that 
they  would  show  more  erratic  scores  minus  than  plus  because 
distractions  of  the  moment  operate  in  this  direction  and  affect 
rate  tests  most  of  all.  Excepting  the  Algebra  Addition  and 
Subtraction  test  which  was  given  but  once  and  which,  more- 
over, was  in  process  of  construction,  spelling  caused  more  er- 
ratic scores  than  any  other  test,  and  all  of  these  were  erratic 
in  the  minus  direction. 

The  number  of  erratic  scores  resulting  cannot  be  taken  as 
a  criterion  for  judging  the  unreliability  of  a  test  except  in  cases 
where  scores  are  caused  by  chance  happenings.  Within  limits 
the  possibility  for  such  results  in  a  test  would  appear  to  have 
an  inverse  relation  to  the  reliability  of  the  test.  The  possibility 
of  fine  discrimination  in  achievements  and  the  possibility  for 
the  functioning  of  a  wide  range  of  ability  appear  to  be  two  fac- 
tors which  have  a  direct  relation  to  the  value  of  a  test  in  educa- 
tional diagnosis  of  the  individual. 

Table  XXIII  summarizes  the  results  in  the  first  half  of  Table 
XXII  in  a  different  way  from  that  in  which  they  are  summar- 
ized in  the  last  half  of  that  table.  It  shows  the  erratic  scores 
plus  and  minus  by  tertile  and  total  for  each  testing.  The  sig- 
nificance of  the  table  is  in  the  increase  in  the  number  of  erratic 
scores  both  plus  and  minus  of  the  second  tertile  over  the  first 
and  of  the  third  tertile  over  the  second.  The  numbers  of  erratic 
scores  plus  are  4,  10,  and  17,  and  the  numbers  of  erratic  scores 
minus  are  20,  21,  and  36  for  Tertiles  I,  II,  and  III,  respectively. 
Of  the  total  number  of  erratic  scores,  the  per  cent  in  each  tertile 
is  as  follows :  first  tertile,  22  per  cent ;  second,  29  per  cent ;  and 
third,  49  per  cent. 
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TABLE  XXIII 

Number  of  Scores  3  Q  or  More  Plus  ob  Minus  by  Tertiles  and 

Total  for  Each  Testing 


Tertile  I 

Tertile  II 

Tertile  III 

to 

1 

g 

§ 

s; 

*« 

s-d 

^ 

Total 

Feb.  1916 

2 

5 

7 

4 

6 

10 

2 

7 

9 

26 

Feb.  1917 

0 

7 

7 

4 

5 

9 

7 

15 

22 

38 

June  1917 

2 

8 

10 

2 

10 

12 

8 

14 

22 

44 

Total 

4 

20 

24 

10 

21 

31 

17 

36 

53 

108 

2.    Extreme  Variability  of  Different  Boys 

The  classification  of  erratic  scores  by  test  in  which  they  oc- 
curred is  only  part  of  their  description.  Under  this  topic  an- 
other part  is  given, — the  classification  by  boys  who  made  them. 
The  following  questions  are  considered:  What  per  cent  of  the 
boys  made  erratic  scores?  Were  there  more  or  fewer  boys  who 
made  erratic  scores  the  longer  they  remained  in  school?  If  a 
boy  has  erratic  scores  in  one  testing  what  is  the  expectancy  of 
his  having  erratic  scores  in  one  or  both  of  the  other  testings? 
How  do  the  high,  median,  and  low  ranking  divisions  compare  as 
to  the  number  of  boys  making  erratic  scores? 

Table  XXIV  gives  the  number  of  boys  making  erratic  scores 
in  each  test  in  one  testing  only  and  in  all  the  combinations  of 
testings.  For  example,  the  three  boys  counted  in  spelling  in 
next  to  the  last  column  of  the  table  are  not  counted  under  the 
separate  years.  The  results  shown  in  this  taJble  are  not  different 
from  what  would  be  expected  in  the  light  of  Table  XXII.  Spell- 
ing and  the  rate  tests, — opposites,  mixed  relations,  and  easy 
directions — show  the  largest  number  of  boys  making  erratic 
scores  minus.  In  comparing  these  totals,  division  by  the  num- 
ber of  times  the  tests  were  given  is  implied.  The  only  point 
that  should  be  noted  in  connection  with  the  number  of  boys 
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making  erratic  scores  plus  is  the  complete  lack  of  such  in  the 
tests  just  mentioned, — spelling  and  the  rate  tests — except  one 
case  in  easy  directions. 

TABLE  XXIV 

Number  of  Boys  Having  Scores  3  Q  or  More  Plus  or  Minus  in  Each 

Type  of  Test  in  Either  One  or  More  Testings 


3Q 

or  more  p 

lus 

3  Qor  more  minus 

CO 

o 

t^ 

^ 

0 

s 
0 

0 

CO 

"« 
g 

0 

1 

Multiplication 

1 

3 

s- 

4 

4 

2 

■:::■ 

6 

2 

Division 

1 

-:::■ 

1 

4 

•Si 

1 

5 

2 

Algebra  Add.  Subt. 

® 

* 

6 

6 

•:;:■ 

2 

2 

1 

Algebra  Mult.  Div. 

•:■:• 

• 

1 

1 

«■ 

•S:- 

1 

Trabue  (botli  tests) 

3 

3 

5 

11 

2 

2 

4 

6 

Reading 

1 

1 

2 

3 

3 

3 

Visual  Vocabulary 

1 

1 

6 

7 

3 

Composition 

1 

3 

4 

1 

3 

Spelling 

3 

1 

1 

3 

9 

3 

Opposites 

7 

1 

1 

10 

3 

Mixed  Relations 

3 

5 

1 

10 

3 

Easy  Directions 

1 

1 

3 

4 

1 

8 

3 

*  This  type  of  test  not  given. 


TABLE  XXV 


Number  of  Boys  Making  Different  Numbers  of  Scores  3  Q  or  More 

Plus  or  Minus  in  All  Three  of  the  Testings.     Each 

Boy  is  Counted  Only  Once 


No.  of 
Boys 

17 
24 
18 

6 

5 

2 


No.  of 
Scores 

0 

1 

2 

3 

4 

5 
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Table  XXV  shows  the  number  of  boys  making  different  num- 
bers of  erratic  scores  in  all  three  testings  combined.  The  re- 
sults show  that  76  per  cent  of  the  boys  made  one  or  more  erratic 
scores  in  all  of  the  three  testings.  However,  the  impression 
given  by  this  percentage  is  not  quite  fair.  Too  great  a  penalty 
is  placed  upon  the  making  of  one  erratic  score  in  any  one  of 
the  three  testings.  This  measure  should  be  supplemented  by  the 
average  of  the  three  testings.  Table  XXVI  shows  that  in  the 
testings  taken  separately  there  were  25,  30,  and  34  boys  respect- 
ively who  made  erratic  scores.  These  numbers  give  an  average 
for  the  three  testings  of  41  per  cent  of  the  boys  who  made  erratic 
scores. 

Table  XXVI  also  answers  the  question  as  to  whether  more 
or  fewer  boys  made  erratic  scores  the  longer  they  remained  in 
school,  showing  that  in  the  second  and  third  testings  the  num- 
ber was  increasingly  greater.  In  February,  1916,  35  per  cent 
made  one  or  more  erratic  scores;  in  February,  1917,  42  per  cent; 
and  in  June,  1917,  47  per  cent. 

Another  question  arises  in  this  connection:  Do  the  pupils 
who  make  erratic  scores  make  more  or  fewer  per  pupil  in  the 
later  testings?  From  the  data  of  Tables  XXIII  and  XXVI  it 
is  found  that  the  number  of  erratic  scores  per  pupil  making 
erratic  scores  in  the  first  testing  is  1.04,  in  the  second  testing, 
1.27,  and  in  the  third  testing,  1.29.  The  rest  of  the  data  in- 
cluded in  Table  XXVI  show  the  number  of  scores  plus  and  the 
number  minus  made  by  every  boy  in  each  testing  and  in  all 
three  testings  combined.  The  table  reads :  In  February,  1916,  47 
boys  made  neither  plus  nor  minus  erratic  scores;  7  made  one 
plus  score  each  and  no  minus  scores;  17  made  one  minus  score 
each  and  no  plus  scores;  and  1  made  one  plus  score  and  one 
minus  score. 

Table  XXVII  analyzes  the  number  of  boys  opposite  each  num- 
ber of  testings  accordingly  as  they  made  only  plus,  only  minus, 
or  both  plus  and  minus  scores  in  the  different  testings.  The 
last  case  at  the  bottom  of  the  table  is  interesting.  In  one  test- 
ing this  boy  made  one  or  more  erratic  scores  plus,  but  none 
minus ;  in  another  testing  he  made  one  or  more  minus,  but  none 
plus;  and  in  the  remaining  one  of  the  three  testings  he  made 
both  plus  and  minus  erratic  scores.     The  table  shows  that  the 
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TABLE  XXVI 

NuMBEB  OF  Boys  Making  Scores  3  Q  or  More  Plus  or  Minus  and  the 
Number  of  Scores  of  Either  Type  That  Each  Boy  Made 


Feb.   1916 


3  Qor 

Plus 

More 

0 

1         2 

3 

0 

47 

7 

1      ' 

17 

1 

3 

Feb.   1917 

3  Qor 

Plus 

More 

0 

1         2 

3 

0 

42 

6        1 

«      1 
1      ' 

16 
4 

3 

3 

June  1917 

The  Three 

Testings 

3  Qor 

P  I  u  s 

3Q 

or 

PI 

tt  s 

More 

0 

1         2 

3 

Blore 

0 

1 

2 

5     4     5 

0 

38 

7         1 

0 

17 

7 

1 

1 

™      1 

19 

2 

1 

17 

9 

1 

3 

3 

1 

CD 

2 
3 

8 
1 

3 

2 

3 

1 

4 
5 

3 
1 

1 

erratic  scores  made  by  one  individual  are  not  confined  to  one 
type.  In  the  three  testings  16  boys  made  erratic  scores  both 
plus  and  minus,  and  7  of  these  made  erratic  scores  both  plus 
and  minus  in  the  same  testing.  There  were  30  boys  who  made 
only  minus  erratic  scores  and  9  who  made  erratic  scores  in  the 
plus  direction  only.  Of  the  72  pupils  who  were  tested  76  per 
cent  made  one  or  more  erratic  scores;  42  per  cent  made  erratic 
scores  in  one  testing  only;  22  per  cent  in  two  testings;  and  12 
per  cent  in  all  three  testings.  Of  the  boys  who  showed  this 
amount  of  variability  in  their  achievements  in  one  testing,  46 
per  cent  showed  it  again  in  either  one  or  both  of  the  other  two 
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TABLE  XXVII 

Number  of  Boys  Making  Scokks  3  Q  ob  More  Plus,  Minus,  and  Plus 

AND  Minus  in  One  or  More  of  the  Three  Testings.     Each 

Boy  is  Counted  Only  Once 


No.  of 
Boys 

Type  of 
Scores  Made 

None  of  the 
Three  Testings 

17 

One  of  the 
Three  Testings 

30 

18 
[     4 

+ 

Two  of  the 
Three  Testings 

Ifi 

r    1 

8 
6 
1 

+     + 

+     — 
-f-     ± 

All  Three 
Testings 

9 

4 
2 
1 

1 
1 

\  ++I  1  +1 
II    l+l 
1    II+  + 

testings.  These  figures  show  that  a  large  percentage  of  the  boys 
made  extremely  variable  scores, — scores  of  3Q  or  more  above 
and  below  their  median  achievements. 

One  more  question  asked  at  the  beginning  of  this  topic  re- 
mains to  be  answered :  How  do  the  high,  median,  and  low  divi- 
sions compare  in  the  number  of  boys  making  erratic  scores? 
The  answer  could  be  predicted  from  Table  XXIII.  Table 
XXVIII  gives  the  facts.  In  every  testing  the  second  tertile  has 
more  boys  making  erratic  scores  than  does  the  first  tertile,  and 
in  every  testing  the  third  tertile  has  most  of  all,  with  one  ex- 
ception, February,  1916,  when  there  were  more  in  the  second 
tertile.  The  totals  show  a  consistent  increase  in  the  number. 
Using  the  average  number  of  boys  who  made  erratic  scores  it 
is  found  that  24  per  cent  are  in  the  first  tertile,  32  per  cent  are 
in  the  second  tertile,  and  44  per  cent  are  in  the  third  tertile. 
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TABLE  XXVIII 

NuMBEas  OF  Boys  Having  Scores  3  Q  ob  More  Either  Plus,  or  Minus, 
OB  Plus  and  Minus  by  Tertiles  and  Total  fur  Each  Testing 


Tertile  I 

Tertile  II 

Tertile  III 

Total 

CO 

cc 

CO 

S 

S 

S 

s 

S 

S 

K 

s 

i? 

^ 

^ 

« 

r^ 

^ 

5i 

s 

5si 

§ 

'W 

^ 

s 

'« 

Si 

S 

'e 

;»5 

§ 

"W 

g 

s 

s. 

s 

^ 

s 

s 

^ 

s 

® 

03 

<s 

^ 

o 

S 

c 

f*^ 

o 

CO 

c 

^^ 

o 

CO 

c 

S 

g 

C<5 

s 

s 

.S 

s 

S 

CO 

S 

cc 
S 

CO 

S 

s 

CO 

5 

s; 

■^ 

s; 

^ 

s; 

■^ 

s; 

^ 

s; 

^ 

Si 

e:; 

s; 

^ 

ft^ 

e:5 

Feb.   1916 

1 

4 

1 

6 

4 

6 

10 

2 

7 

9 

7 

17 

1 

25 

Feb.   1917 

6 

6 

3 

4 

1 

8 

4 

10 

2 

16 

7 

20 

3 

30 

June  1917 

1 

7 

1 

9 

1 

9 

1 

11 

6 

7 

1 

14 

8 

23 

3 

34 

Total  * 

2 

17 

2 

21 

8 

19 

2 

29 

12 

24 

3 

39 

22 

60 

7 

89 

*  In  the  totals  for  the  three  testings  the  same  boy  may  be  counted  more  than  onc«. 


3.    Reduction  of  Variability  by  Re-examination 

In  the  preceding  topic  of  this  section  it  has  been  found  that 
of  all  the  erratic  scores  made  71  per  cent  were  minus  and  29  per 
cent  were  plus,  and  that  of  all  the  boys  making  erratic  scores  16 
per  cent  made  plus  scores,  55  per  cent  minus  scores,  and  29  per 
cent  made  both  plus  and  minus  scores.  The  problem  here  is 
to  determine  the  reduction  in  the  number  of  erratic  scores  in 
these  same  tests  which  a  special  examination  under  closely  con- 
trolled conditions  would  produce. 

The  tests  used  in  the  special  examination  were  identical  with 
those  used  in  the  original  testings.  They  were  given  about  three 
weeks  after  the  third  testing.  The  time  allowed  was  as  nearly 
equal  to  the  time  in  the  original  testing  as  was  possible.  Espe- 
cial care  was  taken  to  insure  the  subject's  best  reaction  in  accord- 
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TABLE  XXIX 

Comparison  of  Scores  in  Original  and  Special  Tests  of  Certain  Boys 

BLaving  Scores  3  Q  or  More  Minus  in  Original  Tests 


Tests 

8 

^1 

2 

Scores  more 
than  3  Q 
Minus 

Scores  less 
than  3  Q 
Minus 

Per  Cent  of 
Scores  more 
than  3  Q 
Minus 

Beduction  in  Per  Cent 
of  Scores  more 
than  3  Q 
Minus 

Spelling 

3 

Original 
Special 

9 
9 

8 
5 

1 
4 

88.9 
55. « 

33.3 

Opposites 

4 

Original 
Special 

12 

8 

5 
3 

7 
5 

41.7 
37.5 

4.2 

Mixed 
Relations 

5 

Original 
Special 

15 
10 

5 
0 

10 
10 

33.3 

0 

33.3 

Easy 
Directions 

4 

Original 
Special 

12 

8 

5 
1 

7 
7 

41.7 
12.5 

29.2 

ance  with  the  directions  of  the  test.  These  tests  are  described 
in  Section  III  under  Special  Testing.  Re-examination  of  all 
the  boys  who  made  erratic  scores  in  the  tests  in  which  they 
made  them  would  have  produced  the  most  reliable  results.  This, 
however,  was  inexpedient  and  consequently  only  a  part  of  the 
group  were  re-examined. 

In  Table  XXIX  the  results  secured  in  the  four  tests  used  in 
the  special  examination  are  compared  with  the  results  of  the 
original  testings.  The  values  in  Q  for  all  the  tables  in  this  topic 
were  calculated  by  using  the  Q  of  the  original  distributions. 
Since  the  number  of  scores  obtained  in  the  special  testing  is 
not  the  same  in  every  case  as  in  the  original  testings  and  also 
because  the  number  is  not  the  same  in  all  tests,  the  gains  made 
are  expressed  in  per  cents.     These  are  shown  in  the  last  column 
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of  the  table  under  Reduction  in  Per  Cent  of  Erratic  Scores. 
The  special  examination  produced  a  reduction  of  22  per  cent  in 
the  number  of  erratic  scores  on  the  basis  of  the  total  number  of 
scores  made.  All  of  this  reduction  of  variability  should  not  be 
credited  to  the  elimination  of  accidental  or  unusual  occurrences. 
Some  of  it  is  probably  due  to  improvement  through  practice,  es- 
pecially in  the  case  of  the  tests  which  had  been  used  in  the  third 
testing. 

TABLE  XXX 

Comparison  of  Scores  of  Certain  Boys  in  Special  Tests  With  Theib 
Scores  in  Corresponding  Original  Tests 


Difference 

Difference  in 

g 

r 

i^ 

Between   Score 

Value  in  Q 

^^ 

cc    ^ 

in  Special 

Between  Special 

s  5: 

i,^??. 

05  •«• 

^^i 

Test  and 

and  Original 

H:;k^ 

^:'^.| 

P^h 

^:i 

Original  Test 

Test 

.^-. 

■^11 

"^^  e 

s.,^ 

^.?^?^ 

ssf>< 

=^>?  s 

Test 

Gain 

Loss 

Gain 

Loss 

Spelling 

3 

9 

7 

0 

5 

4.3 

0 

1.5 

59 

63.5 

18 

8 

3 

1.3 

2.1 

1.0 

58 

72 

16 

3 

5 

10.0 

1.0 

1.3 

65 

56.5 

Opposites 

4 

8 

0 

4 
1 
4 
3 

1 
1 
1 

0 

2.6 

.3 

2.6 

1.9 

2.1 
1.3 
2.1 

60 
15.5 
69 
48.5 

44 
46 
50 
65 

Mixed 

5 

10 

13 

1 

4.4 

.4 

35 

61.5 

Relations 

7 

11 

2 

8 

13 
9 

1 
4 

2.3 

3.8 

.7 

2.7 

4.4 
3.0 

.4 
1.4 

71 
15.5 
67 
14 

45 
56.5 
55 
49 

Easy 

4 

8 

6 

7 

2.5 

4.7 

72 

69 

Directions 

4 
4 

1 

8 
8 
5 

1.7 

1.6 

.4 

5.4 
5.3 
3.4 

64 
41 

68 

26.5 

43 

71 

Table  reads:  In  spelling  3  boys  were  re-examined  in  three  tests  each. 
One  boy  made  a  score  7  points  higher  than  his  score  in  February,  1916, 
a  score  equal  to  his  score  in  February,  1917,  and  a  score  5  points  lower 
than  his  score  in  June,  1917.  In  values  of  Q  his  scores  were  4.3  Q  higher, 
equal,  and  1.5  Q  lower  than  his  respective  original  scores.  This  boy 
ranked  59  in  ability  and  63.5  in  variability.     (1  being  least  variable.) 
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Table  XXX  analyzes  by  individuals  the  difference  in  value  of 
Q  between  the  original  and  special  testings.  It  shows  that  the 
number  of  gains  in  score  greatly  exceeds  the  number  of  losses. 
There  are  26  cases  of  gain  and  7  of  loss  and  2  cases  where  the 
score  is  the  same  as  in  the  original  testing.  From  the  results 
in  this  table  it  is  found  that  the  average  gain  in  points  over  the 
original  scores  is  5  in  spelling,  1.1  in  opposites,  5.9  in  mixed  re- 
lations, and  5.6  in  easy  directions.  These  results  are  supple- 
mentary to  the  results  of  Table  XXIX.  Of  the  sixteen  indi- 
\-iduals  re-examined  all  but  one  were  below  the  median  in  ability 
and  all  but  one  were  more  variable  than  the  median. 

4.     The  Causes  of  Extreme  Variability 

The  results  of  this  investigation  show  a  rather  large  amount 
of  variability  among  the  different  achievements  of  the  individual 
when  compared  with  the  variability  of  the  group.  By  the  meth- 
od described  in  Section  IV  the  amount  of  this  variation  has  been 
measured  and  the  results  have  been  given  in  Section  V.  In  this 
section  the  extreme  cases  of  variability, — those  3Q  or  more  from 
the  median  scores  of  the  individual — have  been  segregated  and 
classified  by  tests  and  by  individuals.  Also  the  results  from 
a  re-examination  of  certain  boys  making  extremely  variable 
scores  have  been  compared  with  the  results  from  the  original 
tests  of  the  same  boys.  Some  evidence  concerning  the  causes 
of  extremely  variable  or  erratic  scores  has  appeared  in  connec- 
tion with  other  parts  of  the  problem.  Under  this  topic  such 
evidence  will  be  collected  and  some  additional  data  will  be  dis- 
cussed. 

The  following  causes  appear  to  be  factors  which  may  operate 
individually  or  in  combinations  to  produce  extremely  variable 
or  erratic  scores: 

a.  The  nature  of  the  tests  used. 

h.  The  administration  of  the  tests. 

c.  Accidental  or  unusual  occurrences. 

d.  Statistical  treatment  of  results. 

e.  The  ability  of  the  individual  in  different  traits. 

The  effect  of  these  causes  individually  can  be  discussed  from 
a  general  standpoint,  but  the  extent  to  which  each  one  operated 
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in  producing  specific  erratic  scores  can  not  be  definitely  deter- 
mined from  the  data  of  this  investigation. 

The  tests  iised  are  well  standardized  in  degrees  of  difficulty. 
However,  the  amount  of  the  increasing  increments  of  difficulty 
which  can  be  shown  by  the  individual  score  and  also  the  range 
of  ability  which  is  covered  vary  considerably.  In  group  meas- 
urements increments  smaller  than  the  interval  of  the  scale  can 
be  measured  by  interpolation  within  the  interval  of  the  scale, 
but  in  measurements  of  the  individual  such  calculations  cannot 
be  made  and  consequently  increments  smaller  than  the  interval 
of  the  scale  cannot  be  measured. 

In  transmuting  the  original  scores  into  multiples  of  Q  the  value 
of  a  score  was  taken  as  one  half  interval  higher  than  the  actual 
score  in  all  tests  except  composition  in  which  case  the  value 
taken  was  the  exact  score.  The  values  taken  thus  represent  the 
midpoint  of  all  the  unmeasured  achievements.  Consequently 
the  maximum  displacement  that  could  be  produced  by  the  in- 
terval of  the  scale  is  just  barely  less  than  one  half  the  amount 
of  Q  representing  the  interval  of  the  original  distribution.  The 
amount  of  Q  representing  one  half  the  interval  of  each  of  the 
original  distributions  is  shown  in  Table  XXXI. 

TABLE  XXXI 

The  Amount  of  Q  Representixg  One-Half  the  Inteeval  of  the 

Distributions  of  the  Different  Tests 

Feb.  Feb.  June 

1916  1917  1917 

Woody  Multiplication 24  .42 

Woody  Division    .19  .75 

Hotz  Algebra,  Add.  and  Sul)t .25 

Hotz  Algebra,  Mult,  and  Div .19 

Tralnie  B,  J,  L 30  .41  .46 

Trabue  C,  K,  M 38  .35  .25 

Reading  Tests 12  .11  .16 

Visual  Vocabulary 14  .04  .12 

Composition    ' 09  .12  .10 

Spelling 32  .13  .16 

Opposites    27  1.07  .33 

Mixed  Relations 11  .17  .17 

Easy  Directions 21  .34  .66 

Table  XXXI  shows  the  extent  to  which  the  different  tests 
may  have  failed  to  record  the  achievements  of  the  individual 
due  to  the  extent  of  the  interval  of  the  distribution  when  trans- 
muted into  Q.  Obviously  an  allowance  for  this  can  not  be 
made  for  any  given  score  because  the  known  value  nearest  the 
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achievement  has  already  been  taken.  The  table  shows  that  in 
only  a  few  of  the  thirty-three  tests  could  the  interval  of  the  dis- 
tribution be  of  much  sipnifieance  in  causing  extremely  variable 
scores.  It  may  effect  not  only  the  extremely  varibale  scores 
but  also  all  other  scores  to  the  same  extent.  The  point  is  that 
if  it  happens  to  act  upon  an  achievement  that  should  be  just 
less  than  three  Q  in  either  direction  from  the  median  of  the  in- 
dividual's achievement  it  puts  that  score  in  the  group  called 
erratic  in  this  study.  The  effect  of  the  interval  of  distribution 
is  significant  only  in  connection  with  particular  scores ;  it  does 
not  affect  theoretically  either  the  average  amount  of  variability 
found  in  Section  V  or  the  total  number  of  erratic  scores  found 
in  Section  VI. 

Another  phase  of  the  first  factor  in  the  causation  of  erratic 
scores  is  the  range  of  ability  covered  by  the  test.  The  original  dis- 
tributions show  that  the  upper  limit  of  the  scale  was  not  reached 
by  any  of  the  pupils  in  the  following  tests :  arithmetic,  algebra, 
language,  reading,  visual  vocabulary,  and  composition.  In  the 
spelling  tests  the  upper  limit  was  reached  by  seven,  one,  and  eight 
pupils  in  the  three  testings  respectively.  In  the  association  tests 
only  a  few  pupils  reached  the  upper  limit  except  in  the  oppo- 
sites  test  of  February,  1917  and  the  easy  directions  tests  of  Feb- 
ruary and  June,  1917  when  a  relatively  large  number  of  the  pu- 
pils made  the  highest  score  possible.  These  tests  did  not  permit 
the  best  pupils  to  show  their  ability  in  comparison  with  the  rest 
of  the  group.  Consequently  the  nature  of  the  tests  is  a  factor 
that  probably  prevented  some  extremely  variable  scores  in  the 
plus  direction  from  the  median. 

The  distributions  of  scores  resulting  from  ranges  insufficient 
to  cover  the  ability  of  all  the  pupils  increased  to  some  extent 
the  amount  of  Q  representing  the  value  of  the  score.  This  has 
been  illustrated  by  Fig.  14  and  discussed  under  Topic  2  of  Sec- 
tion IV.  It  was  pointed  out  there  that  the  skewness  of  the 
curve  tends  to  reduce  the  extent  of  the  Q  of  the  original  dis- 
tribution from  what  it  would  be  in  a  normal  distribution.  This 
increases  the  variability  of  the  scores  w^hen  expressed  in  mul- 
tiples of  the  Q  which  is  smaller  than  it  would  be  in  a  normal  dis- 
tribution. The  effect  is  cumulative  so  that  the  far.- ^  a  score 
is  from  the  median  of  the  original  distribution  the  greater  is 
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the  amount  of  spurious  variability  produced  by  this  cause.  On 
account  of  its  cumulative  effect  this  cause  probably  placed  some 
scores  in  the  group  called  erratic  and  did  not  have  the  op- 
posite effect  on  other  scores.  From  the  distributions  it  would 
appear  that  skewness  could  be  effective  to  a  marked  degree  only 
in  the  minus  direction  and  only  in  the  opposites  test  of  February, 
1917  and  the  Easy  Directions  tests  of  February  and  June,  1917. 
The  same  cause  could  operate  to  make  a  score  less  than  3Q  from 
the  individual  median  when  it  should  be  more  than  3Q  from  it 
if  the  median  achievement  of  the  individual  is  more  than  3Q 
from  the  median  of  the  group.  However,  there  are  no  such 
cases  in  this  investigation.  The  form  of  the  distribution  as  a 
cause  of  extreme  variability  on  the  part  of  the  individual  may 
be  attributed  partly  to  the  nature  of  the  tests  and  partly  to  the 
statistical  treatment  of  the  results. 

The  extent  of  the  interval,  the  range  of  ability  covered  by 
the  tests,  and  the  form  of  distribution  of  the  scores  have  been 
considered  in  connection  with  the  nature  of  the  tests  as  possible 
causes  of  extreme  variability.  The  extent  of  the  interval  of  the 
distribution  was  found  to  have  but  little  effect  in  causing  erratic 
scores,  and  since  it  has  a  compensating  effect  its  significance  is 
almost  negligible.  The  range  of  ability  covered  by  the  test 
probably  prevented  some  scores  from  being  erratic  in  the  plus 
direction.  The  form  of  distribution  of  the  scores  in  certain  tests 
tends  to  magnify  the  amount  of  variability  in  the  minus  direc- 
tion, and  probably  makes  the  number  of  scores  in  the  minus 
direction  larger  than  it  should  be.  The  last  two  causes  probably 
account  to  some  extent  for  the  disparity  between  the  number  of 
extremely  variable  scores  in  the  plus  direction  and  the  number  in 
the  minus  direction. 

The  practical  significance  of  these  causes  of  variability  is 
illustrated  by  the  use  of  standardized  tests  in  the  classification 
of  pupils  for  the  purposes  of  instruction.  Here  the  individual 
rather  than  the  group  is  the  unit  to  be  dealt  with.  In  order  to 
classify  the  individuals  of  a  group  with  adequate  exactness  the 
range  of  the  tests  should  be  somewhat  greater  than  the  range 
of  ability,  and  the  interval  between  scores  should  be  small. 

The  ext^-'J.  to  which  the  administration  of  the  tests  caused 
erratic  scores  will  have  to  be  judged  by  the  conditions  under 
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which  the  tests  were  given.  In  the  first  testing  five  of  the  eleven 
tests  were  given  at  the  schools  from  which  the  pupils  came. 
The  remaining  six  tests  were  given  at  the  Speyer  School.  The 
tests  were  administered  by  several  graduate  students  under  the 
direction  of  Professor  Briggs.  The  second  testing  was  conducted 
by  Dr.  Fretwell,  who  was  one  of  the  group  giving  the  tests  the 
first  time,  and  by  the  writer.  The  third  testing  was  conducted 
by  the  writer.  Five  tests  of  the  first  testing  were  given  in  the 
regular  class  rooms  of  the  public  schools.  The  remaining  six 
tests  of  the  first  testing  and  all  the  tests  of  the  last  two  testings 
were  given  in  the  regular  class  rooms  of  the  Speyer  School. 
The  tests  were  given  during  scheduled  periods  of  the  school  day. 
Pupils  were  tested  in  regular  class  groups  of  about  twenty-five 
each.  Instructions  concerning  the  tests  were  brief  and  of  similar 
nature  for  each  test  at  each  succeeding  testing.  Considering 
these  conditions  it  is  probable  that  the  administration  of  the 
tests  had  but  little  effect  in  producing  erratic  scores. 

Accidental  or  unusual  occurrences  probably  had  a  marked 
effect  upon  the  scores  of  certain  individuals.  In  one  of  the  easy 
directions  tests,  for  example,  the  completion  of  a  face  by  the 
addition  of  the  nose  caused  unusual  merriment  for  certain  pupils 
who  by  chance  or  design  produced  a  rather  grotesque  face  by  the 
type  of  nose  added.  In  a  few  cases  this  diversion  impeded  ma- 
terially the  speed  of  the  work  thus  precluding  a  normal  achieve- 
ment. 

In  one  or  two  instances  in  the  spelling  tests  pupils,  either  be- 
cause of  some  accidental  occurrence  or  because  of  slowness,  fell 
behind  the  rate  at  which  the  words  were  being  pronounced. 
This  probably  reduced  the  number  of  words  spelled  correctly. 
It  is  probable,  however,  that  if  any  extremely  variable  scores 
were  produced  by  accidental  occurrences  practically  all  were 
in  spelling  and  in  the  rate  tests. 

One  bit  of  evidence  bearing  upon  this  cause  and  also  upon  the 
administration  of  the  tests  is  found  in  connection  with  Table 
XXIX  concerning  the  amount  of  reduction  in  per  cent  of  erratic 
scores  by  special  testing.  When  tested  in  groups  of  from  two 
to  five  in  spelling  and  in  the  rate  tests,  certain  pupils  who 
had  made  extremely  variable  scores  in  the  original  testings  re- 
duced their  number  of  erratic  scores  by  from  four  to  thirty-three 
per  cent  of  the  total  number  of  scores.     Since  the  special  tests 
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were  given  about  three  weeks  after  the  third  testing  it  is  prob- 
able that  practice  entered  into  the  reduction  of  these  per  cents  to 
some  extent.  Apparently  accidental  occurrences  had  little  ef- 
fect upon  the  difficulty  tests. 

The  statistical  method  of  combining  the  results  was  chosen 
because  it  seemed  to  "preserve  all  the  refinement  of  the  original 
measurements"  to  a  greater  extent  than  other  methods.  That 
the  variability  of  certain  scores  has  been  magnified  to  some  de- 
gree by  this  method  has  already  been  pointed  out  in  connection 
with  the  nature  of  the  tests.  However,  the  method  used  is  cer- 
tainly only  a  small  factor  in  the  causation  of  extremely  variable 
scores. 

Consideration  of  the  ability  of  the  individual  as  a  factor  in 
the  causation  of  extremely  variable  scores  involves  a  study  of  the 
individual's  variability  from  an  angle  slightly  different  from 
the  attack  made  thus  far.  The  problem  of  chief  concern  has 
been  the  variability  of  the  individual  from  his  own  median 
achievement.  The  problem  presented  now  is  the  variability  of 
the  individual  in  his  own  achievements  in  the  same  or  similar 
tests  at  succeeding  testings.  If  exactly  the  same  tests  had  been 
used  at  each  succeeding  testing  this  variability  could  be  meas- 
ured in  terms  of  absolute  amounts  of  gain  and  loss  in  the  dif- 
ferent tests.  Since  the  identical  tests  were  not  repeated  in  all 
cases  such  variability  must  be  measured  in  a  different  way. 
This  can  be  accomplished  either  by  finding  the  difference  be- 
tween the  Q  values  of  the  scores  in  similar  tests  for  the  different 
testings,  or  by  ranking  the  individual's  achievements  in  each 
testing  and  finding  the  variation  in  ranks  of  the  scores  in  ques- 
tion for  the  three  testings. 

If  the  scores  which  vary  greatly  from  the  individual's  median 
achievement  are  mere  chance  happenings  among  the  total  num- 
ber of  achievements  there  should  be  a  greater  variation  among 
the  ranks  in  the  three  testings  of  the  abilities  in  which  such 
scores  occur  than  among  the  ranks  of  the  abilities  which  have 
no  scores  at  so  great  a  distance  from  the  median.  This  of  course 
would  not  hold  if  the  ranking  of  all  the  individual's  achieve- 
ments is  caused  by  mere  chance,  but  even  if  this  is  true  the 
rank  among  the  individual's  achievements  of  the  abilities  hav- 
ing scores  at  the  greatest  distance  from  the  median  would  have 
the  same  cause  as  the  rank  of  any  other  ability  and  should  not 
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vary  in  rank  any  more  than  the  abilities  having  no  scores  at  the 
extreme  distance  from  the  median. 

The  variability  of  the  individual  in  his  own  achievements 
in  the  three  testings  has  been  found  for  nine  of  the  pupils  most 
variable  in  the  range  of  their  scores  in  the  first  two  testings 
and  for  eight  of  the  pupils  who  had  but  one  score  of  3Q  or  more 
in  the  first  two  testings,  and  also  for  eight  of  the  pupils  most 
variable  in  all  three  testings,  and  eight  who  had  but  one  score 
of  3Q  or  more  in  all  three  testings.  The  results  are  given  in 
Table  XXXII.  In  the  treatment  of  the  first  two  testings  the 
two  Trabue  tests  are  combined  because  of  their  similarity  thus 
making  ten  scores  to  be  ranked.  In  the  treatment  of  the  three 
testings  the  two  Trabue  tests  are  combined,  and  the  arithmetic 
and  algebra  tests  are  omitted  because  they  are  not  similar  enough 
to  be  comparable,  thus  making  eight  scores  to  be  ranked. 

The  results  given  in  Table  XXXII  show  that  the  variation  of 
the  scores  at  the  greatest  distance  from  the  individual's  median 
achievement  is  not  essentially  different  from  the  variation  of 
those  nearer  the  median.  They  vary  in  rank  about  the  same  as 
the  other  achievements.  The  last  two  columns  of  the  table  show 
further  that  the  variation  of  the  erratic  scores  in  the  rankings 
of  the  pupils  having  only  one  erratic  score  is  no  greater  than  the 
variation  in  the  rankings  of  the  pupils  who  have  the  most  erratic 
scores.  Inspection  of  charts  like  the  one  shown  in  Figure  15 
shows  that  the  absolute  amount  of  variation  of  these  extreme 
scores  is  greater  than  the  variation  of  those  nearer  the  median, 
but,  like  the  items  near  the  extremes  of  any  distribution,  they 
normally  would  be  expected  to  vary  more  in  absolute  amount. 
The  interpretation  of  these  results  is  that  in  the  ranking  of 
the  individual's  achievements  the  variability  of  the  extreme 
scores  is  caused  by  mere  chance  to  no  greater  extent  than  the 
variability  of  the  scores  nearer  the  median. 

From  one  point  of  view  this  topic, — the  causes  of  extreme  va- 
riability, is  a  study  of  the  forces  that  prevent  perfect  correlation 
between  different  testings  of  the  same  ability,  and  from  the 
same  viewpoint  the  whole  investigation  is  a  study  of  the  lack 
of  perfect  correlation  betw^een  abilities  and  between  different 
testings  of  the  same  ability.  If  there  were  perfect  correlation 
in  the  above  cases  there  would  be,  of  course,  no  variability  among 
either  the  achievements  of  the  individual  in  different  tests  or 
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TABLE  XXXII 

The  Variability  of  Certain  Individuals  in  the  Ranking  of  Theib 

Own  Achievements  fob  the  Different  Testings 


' 

9  of  the  Most  Variable 
Pupils  in  the  First 
Two  Testings 

8  Pupils  Having  One 
Erratic  Score  Each 
in  First  Two  Testings 

S  of  the  Most  Variable 
Pupils  in  All 
Three  Testings 

8  Pupils  Having  One 
Erratic  Score  Each  in 
All  Three  Testings 

Number  of  diflferences  found 
between  ranks  one  or  more  of 
which  are  erratic. 

16 

8 

39 

24 

Number  of  differences  found 
between  ranks  none  of  which 
is  erratic. 

74 

72 

153 

168 

Average  difference  between 
ranks  one  or  more  of  which 
are  erratic. 

2.6 

2.9 

1.9 

1.6 

Average  difference  between 
ranks  none  of  which  is 
erratic. 

3.4 

3.3 

2.0 

2.3 

(Average  difference  between 
such  number  of  ranks  by- 
chance  ranking.) 

3.3 

3.3 

2.6 

2.6 

Median  difference  between 
ranks  one  or  more  of  which 
are  erratic. 

1.5 

2.3 

1.4 

1.3 

Median  difference  between 
ranks  none  of  which 
is  erratic. 

2.1 

3.3 

2.1 

2.3 

(Median  difference  between 
such  number  of  ranks  by 
chance  ranking.) 

2.4 

2.4 

1.8 

1.8 

Explanation  of  the  table.  The  scores  of  9  of  the  most  variable  pupils 
were  ranked  from  1  to  10  in  each  of  the  first  two  testings.  The  differences 
between  the  ranks  in  similar  tests  were  found, — 90  in  all.  Of  these  16  are 
differences  between  ranks  of  scores  one  or  both  of  which  are  erratic  and 
74  are  between  scores  neither  of  which  is  erratic.  The  average  difference 
between  the  former  scores  is  2.6  ranks  and  between  the  latter  is  2.4  ranks. 
The  median  difference  between  the  scores  one  or  both  of  which  are  erratic 
is  1.5  ranks  and  between  the  scores  neither  of  which  is  erratic  is  2.1  ranks. 
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among  the  ranks  in  the  same  or  similar  tests  at  different  times, 
Such  relation,  however,  has  not  been  found  to  exist.  Since  vari- 
ability is  found  to  exist  the  question  as  to  how  constant  it  is  logi- 
cally follows.  Forty-five  coefficients  of  correlation  between 
rankings  of  individual  achievements  in  different  testings  were 
found.  They  range  from  +.87  to  — .29  and  average  -\-.32  but 
are  not  considered  to  have  enough  significance  to  be  introduced 
as  evidence  of  any  relation. 

TABLE  XXXIII 

Teachers'  Ratings  on  Certain  Points  Concerning  the  Work 

OF  Fifteen  Pupils 
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Least  variable 


42 
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2 

9 

1 

1 

2 

2 

3  1 

3 

1 

12 

7 

1 

1 

20 

2 

17 

2 

1 

1 

2 

*> 

3 

1 

3  1 

3 

1 

12 

5 

2 

3 

1 

2 

64 

4 

2 

1 

1 

1 

2 

1 

1 

2 

1 

1 

1 

2 

5 

6 

9 

5 

14 

3 

59 

1 

3 

1 

3 

1 

3 

4 

4 

3 

17 

6 

3 

26 

3 

1 

3 

1 

3 

1 

3 

1 

3 

1 

15  4 

1 

7 

16 

1 

Tot. 

8 

7 

_5^ 

10 

8 

^ 

10 

9 

1 

10 

9 

1 

10 

J_ 

Z_ 

4839 

13 

Near  median 


46 

3 

1 

2 

2 

4 

4 

3 

1 

16 

4 

33 

33.5 

2 

1   1 

58 

3 

1 

3 

1 

4 

4 

4 

18 

1 

1 

35 

29.5 

2 

73 

3 

1 

3 

1 

3 

1 

4 

4 

17 

3 

36 

47 
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1 

5 

2 

2 

2 

2 

2 

2 

2 

2 

1 

1 

2 

3 

7 

10 

39 

43 

3 

2 

45 

4 

1 

3 

4 

4 

1 

3 

2 

18 

40 

41 

2 

4 

Tot. 

_6 
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8 
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5  7 

8 

6 

6 

8 

7 

5 

37 

31 

32 

Most  variable 


61 

1 

3 

3 

1 

2 

2 

2 

2 

2 

2 

10 

10 

65 

46 

2 

2 

29 

9 

1 

2 

2 

1 

2 

1 

1 

8 

2 

1 

1 

8 

9 

3 

68 

60 

2 

1 

19 

2 

1 

1 

2 

1 

1 

2 

1 

1 

2 

1 

1 

2 

1 

6 

9 

5 

69 

72 

5 

1 

52 

1 
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1 
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11 

2 
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Further  information  concerning  the  permanency  of  individ- 
ual variability  was  sought  by  asking  the  teachers  of  these  boys 
to  rate  some  of  them  on  certain  points.  Table  XXXIII  gives 
the  results  of  the  ratings  by  four  teachers  who  had  had  all  of 
the  group  selected  in  their  classes  for  a  considerable  length  of 
time.  The  following  instructions  were  given  to  the  teachers: 
"Check  in  the  proper  column  your  judgment  of  the  following 
boys  as  to  whether  they  have  been  consistent,  normal,  or  quite 
erratic  in  the  traits  noted  on  the  accompanying  form.  Add 
any  remarks  that  will  help  to  explain  the  character  of  their 
work."  Normal  was  defined  to  mean  the  amount  of  variation 
that  is  normally  expected.  In  the  form  given  to  the  teachers  the 
pupils  were  listed  in  groups  of  three,  each  group  containing  one 
of  the  least  variable  according  to  the  tests,  one  of  the  most  vari- 
able, and  one  near  the  median  in  variability.  The  findings  con- 
cerning variability  in  the  tests  were  unknown  to  the  teachers. 
In  Table  XXXIII  the  least  variable,  the  most  variable,  and  those 
near  the  median  in  variability  according  to  the  tests  are  seg- 
regated. The  table  reads  as  follows:  Individual  No.  42  was 
judged  consistent  twice  and  normal  twice  in  preparation  of  les- 
sons, etc.  Columns  6  to  9  inclusive  were  not  in  the  form  given 
to  the  teachers.  In  column  6  the  teachers'  judgments  are  sum- 
marized ;  column  7  shows  the  variability  of  the  pupil  as  indicated 
by  the  range  and  approximation  of  the  interquartile  range  in  the 
tests ;  column  8  shows  the  number  of  the  individual 's  scores  3  Q 
or  more  from  his  median  in  all  testings ;  and  column  9  shows  the 
number  of  teachers  suggesting  reasons  for  the  boy's  variability. 

The  totals  in  column  6  of  the  table  show  that  there  is  a  slight 
tendency  for  the  ratings  of  the  teachers  to  agree  with  the  rank- 
ings by  the  tests  in  the  matter  of  variability.  It  is  so  slight 
however  that  it  has  but  little  significance.  The  last  column  of 
the  table  shows  that  the  teachers  suggested  reasons  for  extreme 
variability  in  all  three  groups  offering  about  as  many  for  one 
group  as  for  another.  The  following  causes  for  erratic  work 
were  mentioned  most  frequently:  physical  defects,  nervousness, 
absence,  home  life,  and  outside  work.  In  order  to  get  more 
definite  information  concerning  the  permanency  of  individual 
variability  a  systematic  checking  in  certain  points  would  have 
to  be  continued  by  the  teachers  over  a  fairly  long  period  of  time. 
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It  should  be  pointed  out  in  connection  with  this  section  of  the 
study  that  the  individuals  making  these  extreme  scores,  both 
those  making  them  above  and  those  making  them  below  their 
medians,  offer  much  opportunity  for  further  investigation  by 
repetition  of  these  same  tests  and  other  tests,  and  also  by  tests 
of  their  physical  as  well  as  their  mental  attainments.  The  time 
and  labor  necessary  for  such  a  study  precluded  the  possibility 
of  incorporating  it  in  this  investigation. 

Answers  to  the  following  questions  asked  in  connection  with 
tlie  problem  have  been  proposed  in  this  section  of  the  study. 

Question  6.  To  what  extent  are  there  extremely  variable  or 
erratic  scores?  The  line  marking  off  erratic  scores  was  arbitra- 
rily placed  at  a  distance  of  3Q  in  each  direction  from  the  indi- 
vidual medians.  On  this  basis  the  average  number  of  erratic 
scores  in  all  three  testings  was  found  to  be  4.5  per  cent  of  the 
total  number  of  scores  for  each  testing.  The  later  testings  show 
an  increase  in  the  number  of  erratic  scores.  The  number  in  the 
second  testing  is  46  per  cent  larger  than  the  number  in  the  first 
testing,  and  the  number  in  the  third  testing  is  16  per  cent  larger 
than  the  number  in  the  second  testing. 

Question  7.  How  do  the  bright,  the  mediocre,  and  the  dull 
pupils  compare  as  to  the  number  who  make  erratic  scores,  and 
as  to  the  number  of  such  scores  each  one  makes?  The  extreme 
scores  of  this  group  of  pupils  are  not  especially  characteristic  of 
any  one  division  of  the  group.  Forty-four  per  cent  of  all  the 
boys  who  made  erratic  scores  are  in  the  third  tertile;  32  per 
cent  are  in  the  second  tertile ;  and  24  per  cent  are  in  the  first 
tertile.  Using  the  average  number  of  pupils  making  erratic 
scores  and  the  average  number  of  erratic  scores  made,  it  is  found 
that  the  average  number  of  erratic  scores  per  pupil  is  1.14  in 
tertile  I,  1.07  in  tertile  II,  and  1.36  in  tertile  III. 

Question  8.  What  are  the  causes  of  the  extremely  variant 
scores?  The  causes  of  the  scores  varying  3Q  or  more  from  the 
individual's  median  achievement,  in  so  far  as  this  study  throws 
light  upon  them,  have  been  analyzed  under  the  five  headings :  the 
nature  of  the  tests  used ;  the  administration  of  the  tests ;  acci- 
dental or  unusual  occurrences;  statistical  treatment  of  the  re- 
sults; and  the  ability  of  the  individual  in  different  traits. 

The  nature  of  the  tests  used  probably  prevented  some  cases 
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of  extremely  variable  scores  in  the  plus  direction  from  the  in- 
dividual's median  which  would  have  appeared  if  the  range  of 
ability  covered  by  the  tests  had  been  greater.  The  nature  of 
the  tests  and  the  statistical  treatment  of  the  results  seem  to  have 
magnified  the  amount  of  variability  of  a  relatively  small  propor- 
tion of  the  scores.  The  administration  of  the  tests,  in  so  far  as 
it  can  be  judged  by  the  conditions  of  the  testings,  had  prac- 
tically no  effect  upon  the  variability  of  the  scores.  Accidental 
or  unusual  occurrences  probably  caused  a  few  erratic  scores. 
Under  a  more  detailed  administration  of  the  tests  such  occur- 
rences and  their  effect  could  be  definitely  accounted  for  in  the 
results.  From  the  evidence  of  this  study  it  appears  that  the 
ability  of  the  individual  is  the  greatest  of  the  five  factors  in  the 
causation  of  scores  which  vary  3Q  or  more  from  his  median 
achievement. 


VII 

CORRELATION    BETWEEN    MEASURES    OF    ABILITY, 

MEASURES  OF  VARIABILITY,  AND  MEASURES 

OF  ABILITY  AND  VARIABILITY 

1.     Correlation  Between  Measures  of  Ability 

The  results  that  have  been  set  forth  up  to  this  point  have  dealt 
with  variation.  They  may  be  considered  as  showing  certain  pos- 
itive relations,  but  in  an  indirect  way.  In  this  section  of  the 
investigation  different  relations  will  be  studied  by  means  of  co- 
efiScients  of  correlation.  The  last  question  in  the  statement  of 
the  problem  will  be  considered.  This  question  concerns  the  re- 
lation between  different  measures  of  ability,  the  relation  between 
different  measures  of  variability,  and  the  relation  between  meas- 
ures of  ability  and  variability. 

The  first  coefficients  that  are  given  are  between  different 
methods  of  ranking  pupils  for  composite  achievement.  Three 
methods  were  used.  First,  each  one  of  the  seventy-two  pupils 
was  ranked  by  the  average  of  his  eleven  ranks.  That  is,  the 
pupils  were  ranked  from  one  to  seventy-two  in  each  test.  The 
eleven  ranks  of  each  pupil  were  then  averaged  and  these  aver- 
ages were  ranked  from  one  to  seventy-two,  the  smallest  being 
ranked  one.  The  second  method  was  the  same  as  the  first  except 
that  the  median  rank  was  used  instead  of  the  average  rank.  The 
third  method  was  by  median  rank  in  the  eleven  tests  as  obtained 
from  the  scores  transmuted  into  multiples  of  Q,  Using  the  me- 
dian of  the  individual's  ranks  in  values  of  Q  the  pupils  were 
ranked  from  one  to  seventy-two  as  in  the  other  methods. 

Three  correlations  were  then  calculated  between  the  rankings 
by  each  method, — February  1916  with  February  1917 ;  February 
1917  with  June  1917 ;  and  February  1916  with  June  1917,  The 
coefficients  of  these  correlations  are  given  in  Table  XXXIV,  In 
calculating  all  the  coefficients  in  this  section  the  formula 
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was  used.  The  value  of  these  coefficients  in  terms  of  the  Pear- 
son r  has  been  inferred  from  a  table  ^  .of  such  values.  In  all  cases 
the  inferred  value  of  the  coefficient  is  given.  The  unreliability 
of  the  coefficients  was  determined  by  the  formula 


P.E. 


.6745- 


t.r  —  oht.r  "' 

"The  probable  divergence  of  the  true  coefficient  of  correlation 
from  that  obtained  from  a  limited  random  selection  of  related 
pairs,  is  a  variable  fact  with  a  mode  at  0,  and  a  variability  which 
serves  as  the  measure  of  the  unreliability."-  The  P.E.  is  the 
measure  limiting  the  fifty  per  cent  of  this  variability  which  is 
nearest  the  coefficient  obtained. 


TABLE  XXXIV 
Correlation  Between  Composite  Rankings  in  Ability 


Average  Rank  by  Rank  in  Eleven  Tests 

Feb.  1916  with  Feb.    1917. 

Feb.   1917  with  June  1917. 

Feb.  1916  with  June  1917. 
Median  Rank  bv  Rank  in  Eleven  Tests 

Feb.  1916  with  Feb.    1917. 

Feb.   1917  with  June   1917. 

Feb.  1916  with  June  1917. 
Median  Rank  by  Values  of  Q  in  Eleven  Tests 

Feb.   1916  with  Feb.    1917. 

Feb.  1917  with  June  1917. 

Feb.  1916  with  June  1917. 


77 
78 
68 

69 
73 
53 

69 
69 
54 


P.E. 
of  r 

.03 
.03 
.04 

.04 
.04 

.06 

.04 
.04 
.06 


The  method  of  ranking  the  pupils  by  their  average  achieve- 
ment gives  distinctly  higher  coefficients  of  correlation  than  either 
of  the  other  methods.  The  results  obtained  by  ranking  them  in 
ability  by  the  median  of  their  eleven  ranks  agree  very  closely 
with  the  results  obtained  by  ranking  them  by  their  median  rank 
in  values  of  Q.  Coefficients  of  correlation  obtained  from  both 
of  these  methods  are  approximately  10  per  cent  lower  than  the 
coefficients  obtained  from  the  method  by  average  rank.  The 
reason  for  the  difference  between  the  coefficients  obtained  from 
the  rankings  by  the  average  of  the  eleven  ranks  and  the  coef- 
ficients obtained  from  the  rankings  by  the  median  of  the  eleven 
ranks  is  obvious.  With  only  a  few  measures  a  small  difference 
in  the  median  score  resulting  from  chance  error  or  the  inherent 

1  Thorndike,  E.  L.,  Mental  and  Social  Measurements,  p.  225. 

2  Ibid.,  p.  193. 
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lack  of  fine  discriminations  on  account  of  the  small  number  of 
tests,  affects  the  median  rank  of  the  individual  much  more  than 
several  such  differences  affect  the  average  rank.  In  the  latter 
case  such  differences  tend  to  offset  each  other  or  if  they  do  not 
entirely  balance  each  other  they  enter  into  a  composite  where  so 
much  does  not  depend  upon  a  single  measure. 

One  other  point  should  be  brought  out  in  connection  with  the 
coefficients  of  correlation  in  Table  XXXIV.  By  each  of  the 
three  methods  the  correlation  between  the  February  1916  and 
June  1917  rankings  is  about  10  per  cent  lower  than  the  corre- 
lation between  the  rankings  of  the  testings  closer  together  in 
point  of  time.  It  has  been  found  in  another  section  of  the  study 
that  the  number  of  pupils  making  erratic  scores  and  the  num- 
ber of  erratic  scores  per  pupil  increased  with  each  succeeding 
testing.  Granting  that  there  was  improvement  in  all  the  abili- 
ties tested  this  shows  that  the  amounts  of  improvement  of  dif- 
ferent pupils  in  their  different  abilities  were  increasingly  dis- 
proportionate the  longer  the  pupils  remained  in  school.  The 
coefficients  of  correlation  mentioned  above  tend  to  show  that  the 
improvement  of  the  pupils  in  composite  ability  also  was  made 
at  varying  rates,  and  that  the  rate  of  improvement  of  different 
pupils  did  not  fluctuate  thus  overcoming  the  inequalities,  but 
rather  that  the  inequalities  became  more  pronounced  the  longer 
the  pupils  remained  in  school. 

The  practical  importance  of  such  varying  rates  of  improve- 
ment bears  upon  the  length  of  time  an  evaluation  of  an  achieve- 
ment by  such  tests  can  be  considered  as  a  valid  index  of  the  abil- 
ity of  the  pupil.  As  examples  of  such  variation  two  cases  from 
this  investigation  are  cited.  Pupil  No.  51  ranked  67  by  the  tests 
in  February,  1916 ;  35  in  February,  1917 ;  and  8  in  June,  1917. 
This  pupil  was  placed  in  group  6  in  school  in  February,  1916. 
By  the  judgment  of  the  teachers  he  was  advanced  to  group  5  in 
April,  group  4  in  May,  and  to  group  3  in  June,  1916.  In  Feb- 
ruary, 1917  he  was  in  group  2  and  in  June,  1917  he  was  in  group 
1.  Pupil  No.  7  ranked  28  by  the  tests  in  February,  1916;  39 
in  Februarj^,  1917;  and  63  in  June,  1917.  In  February,  1916 
this  pupil  was  placed  in  group  3,  in  June,  1916,  he  was  in  group 
4,  in  February,  1917,  in  group  5,  and  in  June,  1917  he  was  still 
in  group  5. 


78  Educational  Diagnosis  of  Individual  Pupils 

2.     Correlation  Between  Measures  of  Variability 

Is  there  any  constancy  in  the  variability  of  the  individual's 
achievement?  This  question  can  be  studied  by  finding  the 
amount  of  correlation  between  measures  of  variability  in  the 
different  testings.  Two  methods  of  ranking  the  pupils  are  used. 
One  method  of  ranking  is  by  the  extent  of  their  entire  range  in 
the  eleven  tests.  The  individual  having  the  smallest  range  in 
multiples  of  Q  was  ranked  one,  least  variable,  and  the  pupil 
having  the  largest  range  was  ranked  seventy-two,  most  variable. 
The  other  method  is  by  the  approximation  of  the  interquartile 
range.  The  ranking  was  made  in  the  same  manner,  the  least 
variable  being  ranked  one. 

The  pupils  were  ranked  according  to  variability  in  each  of 
the  three  testings  by  both  of  these  methods.  The  three  combi- 
nations of  the  ninkings  by  each  method  were  then  correlated. 
The  coefficients  are  given  in  Part  A  of  Table  XXXV.  The  re- 
sults show  consistently  a  small  positive  relation  between  the 
amount  of  variability  in  the  three  testings.  Both  methods  show 
about  the  same  results. 

TABLE  XXXV 

COEBELATION   BETAA'EEN   MeASUEES   OF  VaBIABILITY  IN   THE   ELEVEN 

Tests  at  the  Different  Times  They  Were  Given 

P.E. 
of  r 

(A)  Range  in  Values  of  Q 

Feb.  1916  with  Feb.  1917 32 07 

Feb.   1917  with  June  1917 27 07 

Feb.  1916  with  June  1917 17 08 

Interquartile  Range  in  Values  of  Q 

Feb.  1916  with  Feb.  1917 20 08 

Feb.  1917  with  June  1917 29 07 

Feb.  1916  with  June  1917 18 08 

(B)  Range  in  Values  of  Q 

Feb.      1917  with  Teachers'  Ratings 20 08 

June    1917  with  Teachers'  Ratings 20 08 

Interquartile  Range  in  Values  of  Q 

June    1917  with  Teachers'  Ratings 22 07 

In  Part  B  of  Table  XXXV  three  of  the  rankings  of  Part  A  are 
used  to  correlate  with  the  teachers '  ratings  in  variability.  These 
were  secured  as  follows.  Each  teacher  rated  every  pupil  he  or 
she  had  had  in  class  as  to  the  character  of  the  work  done.  Un- 
der one  of  the  three  headings, — consistent,  variable,  and  erratic 
— the  teacher  was  asked  to  "check  either  the  character  of  the 
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work  in  general  or  the  character  of  the  work  in  each  subject." 
Variable  was  to  be  considered  as  the  step  between  consistent  and 
its  opposite,  erratic.  From  six  to  eleven  ratings  were  thus  se- 
cured for  each  pupil.  These  were  turned  into  per  cents  of  con- 
sistent, variable,  and  erratic  ratings.  The  percentage  of  con- 
sistent was  weighted  by  three,  the  percentage  of  variable  by 
two,  and  the  percentage  of  erratic  by  one.  From  the  totals 
of  these  weighted  percentages  the  pupils  were  ranked  from  one  to 
seventy-two.  The  largest  percentage  was  ranked  one,  least  vari- 
able, and  the  smallest  seventy-two,  most  variable. 

The  coefficients  in  Part  B  resulted  from  correlating  these  rank- 
ings with  the  rankings  made  from  the  range  in  the  eleven  tests. 
Here  again,  although  not  high,  the  coefficients  show  consistently 
a  small  amount  of  positive  relation. 

3.     Correlation  Between  Measures  of  Ability 
AND  Variability 

Having  studied  the  resemblance  between  measures  of  ability 
and  the  resemblance  between  measures  of  variability  the  ques- 
tion naturally  follows:  What  is  the  relation  between  ability 
and  variability?  The  results  from  such  correlations  are  shown 
in  Table  XXXVI. 

TABLE  XXXVI 

Correlation  Between  Measures  of  Ability  ajstd  Variability  in  the 

Eleven  Tests.     (Highest  Ability  and  Least  Variability 

Kanked  One.) 

Feb.        Feb.        June 
1916        1917        1917 
Ability  by  Median  Rank  in  Values  of  Q    ) 

with  V     .19  .31  .33 

Variability    by    Range    in    Values    of    Q     j 

Ability  by  Median  Rank  in  Values  of  Q      ) 

with  V    .26  .45  .43 

Variability  by  Inter-Quartile  Range  in  Q      ) 

Composite  Ability  by  Average  Rank  in  Tliree  Testings 

with  ]■  .39 

Composite    Variability   by   Range    in    Three   Testings 

Composite  Ability  by  Average  Rank  in  Three  Testings 

with  y  .55 

Composite   Variability  by   Inter-Quartile   Range    (ap- 
proximation)   in  Three  Testings 
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The  rankings  used  in  the  two  preceding  topics  were  used  to 
find  the  correlation  between  the  ability  and  the  variability  of 
these  pupils.  Variability  is  correlated  with  ability  for  each  of 
the  three  testings,  first,  by  using  the  entire  range  in  values  of 
Q  as  the  measure  of  variability,  and  second,  by  using  the  ap- 
proximation of  the  interquartile  range  as  the  measure  of  varia- 
bility. The  median  rank  in  values  of  Q  is  used  as  the  measure 
of  ability  in  both  cases.  The  pupil  having  the  highest  median 
score  is  ranked  one  in  ability  and  the  pupil  having  the  smallest 
range  is  ranked  one  in  variability.  The  correlations  result  in 
positive  coefficients  in  all  cases,  and  interestingly,  in  greater 
amounts  of  relation  when  the  coefficients  secured  in  the  later 
testings  are  compared  with  the  coefficients  of  the  first  testing. 

From  the  relations  shown  by  the  coefficients  of  correlation  in 
this  section  the  following  summary  may  be  made.  Higher  co- 
efficients of  correlation  were  obtained  by  ranking  these  pupils 
by  their  average  achievement  than  by  ranking  them  by  their 
median  achievement.  When  the  number  of  tests  given  is  rela- 
tively small  the  median  is  affected  much  more  by  slight  devia- 
tions than  is  the  average.  The  teachers'  ratings  in  variability 
show  positive  coefficients  when  correlated  with  the  variability 
as  shown  by  the  tests.  The  relation  between  ability  and  varia- 
bility as  expressed  by  coefficients  of  correlation  is  not  great  but 
is  consistently  positive.  It  was  greater  in  the  later  testings 
than  in  the  first  testing. 


VIII 

CONCLUSIONS 

The  questions  asked  in  connection  with  the  statement  of  the 
problem  may  be  grouped  under  four  headings.  Although  all  of 
these  questions  have  not  been  fully  answered,  the  following  con- 
clusions seem  to  be  justified  in  view  of  the  results  obtained  by 
testing  at  three  different  times,  during  a  period  of  a  year  and 
a  half,  seventy-two  junior  high  school  boys  Mdth  a  group  of 
eleven  standardized  scales  and  tests. 

A.  Concerning  methods  of  comparing  or  equating  individiuil 
measures  of  achievement. 

The  method  of  comparing  the  scores  of  an  individual  by  ranks 
from  highest  to  lowest  in  a  group  is  not  satisfactory  for  the  pur- 
pose of  diagnosing  individual  achievements.  By  this  method 
much  of  the  refinement  of  the  original  measures  is  lost.  The 
method  of  transmuting  the  original  scores  into  multiples  of  a 
measure  of  variability  of  the  group  produces  more  reliable  re- 
sults because  practically  all  of  the  refinement  of  the  original 
measures  is  preserved.  The  semi-interquartile-range  or  the  av- 
erage deviation  is  to  be  preferred  to  the  mean  square  deviation 
as  a  measure  of  variability  for  this  kind  of  statistical  treatment 
as  the  latter  weights  too  heavily  the  extreme  and  erratic  scores. 

B.  Concerning  the  amount  and  distribution  of  individual 
variability. 

The  variability  of  the  individual  in  these  tests  is  a  large  frac- 
tion of  the  variability  of  the  group.  The  average  amount  of  in- 
dividual variability,  measured  in  terms  of  the  Q,  is  82  per  cent 
of  the  group  variability.  This  is  evidence  of  the  unreliability 
of  one  or  a  few  tests  for  the  purpose  of  educational  prognosis. 

The  tests  used  in  the  second  and  third  testings  are  not  in  all 
cases  repetitions  of  the  same  tests  or  tests  comparable  in  the 
amount  of  absolute  variability,  but  the  results  of  certain  tests 
which  are  comparable  tend  to  show  that  the  absolute  amount  of 
group  variability  is  about  the  same  in  all  testings.  The  indi- 
vidual variability  in  terms  of  the  group  variability  is  the  same 
in  the  first  and  second  testings,  and  slightly  greater  in  the  third. 
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The  form  of  distribution  of  the  achievements  of  the  individual 
approximates  the  normal  surface  of  frequency.  The  mode  is 
distinctly  pronounced.  The  chief  divergence  from  the  normal 
curve  is  skewness  downward  from  the  median. 

The  average  range  in  the  achievements  of  the  individual  is 
4.78Q  in  terms  of  the  Q  of  the  group.  The  average  range  in 
achievements  above  the  individual  medians  is  1.94Q,  and  the 
average  range  below  the  individual  medians  is  2.84Q. 

The  lowest  ranking  pupils  are  the  most  variable  in  their 
achievements.  The  variability  of  the  second  tertile  is  greater 
than  the  variability  of  the  first  tertile,  and  the  variability  of 
the  third  tertile  is  greater  than  that  of  the  second  tertile  in  each 
testing.  Measured  by  the  Q  of  the  group  the  average  of  the 
three  testings  shows  that  the  variability  of  the  highest  tertile  is 
,69Q,  that  of  the  middle  tertile,  .78Q,  and  that  of  the  lowest  ter- 
tile, .99Q. 

The  overlappings  of  the  divisions  of  the  group  show  marked 
amounts  of  difference  between  the  median  achievement  of  the 
different  quartiles.  In  terms  of  the  Q  of  the  group  the  median 
achievement  of  the  second  quartile  is  .95Q  lower  than  that  of  the 
first ;  that  of  the  third,  .65Q  lower  than  that  of  the  second ;  and 
the  median  achievement  of  the  fourth  is  .84Q  lower  than  that 
of  the  third.  Therefore  the  pupils  of  this  group  who  are  most 
variable  in  their  achievements  are  also  distinctly  lowest  in 
achievements  as  measured  by  these  tests. 

For  the  purpose  of  individual  diagnosis  it  would  be  of  ad- 
vantage to  have  more  tests  scaled  from  the  zero  point  and  stand- 
ardized in  variability  either  by  grade  or  by  age  of  the  pupil. 

C.     Concerning  extremely  variable  or  erratic  scores. 

Considering  as  erratic  all  scores  at  a  distance  of  3Q  or  more 
in  each  direction  from  the  median  score  of  the  individual  the 
average  number  of  erratic  scores  for  each  testing  is  4.5  per  cent 
of  the  total  number  of  scores  for  each  testing.  Twenty-nine  per 
cent  of  the  erratic  scores  are  plus  and  71  per  cent  are  minus. 

Spelling  caused  more  erratic  scores  than  any  other  test  with 
the  exception  of  Algebra,  Addition  and  Subtraction,  which  was 
in  process  of  construction  and  which  was  given  only  once.  In 
spelling  and  the  three  rate  tests, — opposites,  mixed  relations,  and 
easy  directions — all  but  one  of  the  erratic  scores  are  minus. 
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These  four  tests  contain  46  per  cent  of  the  erratic  scores.  In 
the  remaining  seven  tests  the  total  number  of  erratic  scores  plus 
is  30  and  the  total  number  minus  is  29. 

There  are  108  erratic  scores  in  the  three  testings, — 24  per 
cent  in  the  first,  35  per  cent  in  the  second,  and  41  per  cent  in 
the  third  testing.  In  the  first  testing  35  per  cent  of  the  boys 
made  one  or  more  erratic  scores;  in  the  second  testing,  42  per 
cent ;  and  in  the  third  testing,  47  per  cent.  The  distribution  of 
erratic  scores  among  the  tertiles  is  as  follows :  22  per  cent  are  in 
the  first  tertile;  29  per  cent  are  in  the  second;  and  49  per  cent 
are  in  the  third  tertile.  Using  the  average  number  of  boys  who 
made  erratic  scores  it  is  found  that  24  per  cent  are  in  the  first 
tertile ;  32  per  cent  are  in  the  second ;  and  44  per  cent  are  in  the 
third  tertile.  Therefore  the  results  of  this  study  show  a  notice- 
able increase  in  the  number  of  pupils  making  erratic  scores  in 
the  later  testings  and  a  slight  increase  in  the  number  of  erratic 
scores  per  pupil  in  the  second  and  third  testings. 

In  this  group  of  pupils  76  per  cent  made  one  or  more  erratic 
scores.  Forty-two  per  cent  made  erratic  scores  in  one  testing 
only;  22  per  cent  in  two  testings;  and  12  per  cent  in  all  three 
testings.  Of  the  pupils  who  made  erratic  scores  55  per  cent 
made  them  in  the  minus  direction  only;  16  per  cent  in  the  plus 
direction  only;  and  29  per  cent  in  both  the  plus  and  minus  di- 
rections. No  distinct  types  of  variation  are  found  in  this  group 
of  pupils. 

A  re-examination  under  closely  controlled  conditions  of  a  few 
boys  who  made  the  most  variable  scores  in  spelling  and  in  the 
rate  tests  produced  an  average  reduction  of  25  per  cent  in  the 
number  of  erratic  scores  on  the  basis  of  the  total  number  of 
scores  in  the  re-examination. 

Five  possible  factors  in  the  causation  of  erratic  scores  were 
studied.  They  are :  the  nature  of  the  tests  used,  the  administra- 
tion of  the  tests,  accidental  or  unusual  occurrences,  statistical 
treatment  of  the  results,  and  the  ability  of  the  individual  in  dif- 
ferent traits.  The  nature  of  the  tests  and  the  statistical  treat- 
ment of  the  results  seem  to  have  magnified  the  variability  of  a 
relatively  small  number  of  scores.  The  administration  of  the 
tests  in  so  far  as  it  can  be  judged  was  an  unimportant  factor. 
Accidental  or  unusual  occurrences  probably  caused  a  small  pro- 
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portion  of  the  erratic  scores.  From  the  evidence  of  this  study 
it  appears  that  the  ability  of  the  pupil  in  different  traits  was 
the  greatest  factor  in  the  causation  of  scores  that  varied  3Q  or 
more  from  the  individual's  median. 

D.  Concerning  the  relation  between  measures  of  ability,  be- 
tween measures  of  variability,  and  between  measures  of  ability 
and  variability. 

The  coefficients  of  correlation  between  the  different  testings 
show  that  from  the  results  of  the  first  testing  the  average  achieve- 
ment of  these  boys  in  similar  tests  a  year  later  and  a  year  and  a 
half  later  could  be  predicted  with  a  rather  high  degree  of  ac- 
curacy. However,  the  results  of  the  tests  and  the  judgments 
of  the  teachers  agree  in  showing  a  very  great  amount  of  change 
in  the  ranking  of  certain  individuals  among  the  group  in  the 
later  testings.  For  the  purpose  of  individual  diagnosis  the  re- 
sults obtained  from  a  single  testing  with  such  a  group  of  tests 
should  be  considered  as  indices  of  individual  ability  which  will 
be  valid  for  varying  lengths  of  time.  Such  results  should  be 
supplemented  and  checked  by  repetitions  of  the  same  or  similar 
tests.  School  organization  should  be  flexible  enough  to  allow 
for  a  shifting  among  groups  for  instruction  commensurate  with 
the  relative  gain  or  loss  in  ability  on  the  part  of  certain  indi- 
viduals. 

The  correlation  between  the  first  and  third  testings  which 
are  a  year  and  a  half  apart  in  point  of  time  is  about  10  per  cent 
less  than  the  correlation  between  either  the  first  and  second 
or  the  second  and  third  testings.  This  supplements  the  evidence 
already  found  showing  that  the  pupils  vary  more  in  their 
achievements  the  longer  they  remain  in  school. 

The  amount  of  correlation  between  measures  of  variability  in 
the  different  testings,  although  small,  is  positive  in  all  cases. 

The  coefficient  of  correlation  between  composite  ability  by 
average  rank  in  the  three  testings  and  composite  variability  by 
interquartile  range  (approximation)  in  the  three  testings  is  .55. 
This  seems  to  indicate  that  there  was  a  considerable  amount  of 
relation  between  the  ability  of  these  pupils  to  achieve  in  these 
tests  and  the  consistency  or  lack  of  variability  in  their  achieve- 
ments. 
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TABLE  XXXIX 
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VITA 

The  writer  of  this  dissertation,  Chester  Arthur  Buckner, 
was  born  near  Highland  Center,  Iowa,  January  28,  1885.  He 
attended  the  village  school  at  Highland  Center  intermittently 
until  1900  when  he  received  a  common  school  diploma  from 
Wapello  County,  Iowa.  He  entered  the  high  school  at  Ottumwa, 
Iowa,  in  1901  and  was  graduated  in  1905.  His  undergraduate 
work,  and  part  of  his  graduate  work,  was  done  at  the  State 
University  of  Iowa,  Iowa  City,  Iowa,  where  he  received  the 
degree  of  Bachelor  of  Arts  in  1909  and  the  degree  of  Master  of 
Arts  in  1911.  He  was  a  student  at  Teachers  College  for  a  year 
and  a  half  beginning  February,  1916.  The  writer's  teaching 
experience  includes  one  year  as  teacher  of  mathematics  and 
civics  in  the  high  school  at  Clinton,  Iowa;  two  years  as  head  of 
the  department  of  English  in  the  high  school  at  Manila,  Philip- 
pine Islands;  one  year  and  one-half  and  two  summer  sessions  as 
assistant  professor  of  education  in  the  University  of  Kansas; 
and  one  year  in  charge  of  educational  measurements  in  The  Lin- 
coln School  of  Teachers  College. 
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